BenevolentAI unites technology with human intelligence to re-engineer drug discovery and deliver life-changing medicines. We have developed the Benevolent Platform®, a drug discovery platform built on powerful data foundations with state of the art machine learning and AI technology. Our technology empowers scientists to decipher the vast and complex code underlying human biology, find new ways to treat disease and personalise medicines to patients. Benevolent has active in-house R&D drug programmes in disease areas such as neurodegeneration, immunology, oncology and inflammation and has research and commercial collaborations with leading pharmaceutical and research organisations. The company is headquartered in London with a research facility in Cambridge (UK) and a further office in New York.Who We Are
We are an eclectic bunch at Benevolent, united by our belief that innovative thinking and purposeful technology can truly change outcomes for the better. Our mission is to re-engineer drug discovery and deliver life-changing medicines for patients in need and we do this by applying AI, machine learning and other advanced technologies to reinvent the ways drugs are discovered and developed. We strive to bring together unique skills and perspectives across biology, chemistry, engineering, AI research, informatics, precision medicine and drug discovery.The Role
As the Lead Site Reliability Engineer, you will build a team around you and have line management responsibilities whilst remaining hands-on and steering the direction of Benevolent's cutting-edge infrastructure. You will lead the team of up to seven engineers building and maintaining cloud and Kubernetes-based platforms that form the foundation of our drug discovery pipeline. You must be a strong communicator who can lead by example and guide your team to deliver robust, secure and reliable infrastructure solutions.
Your team will work alongside other infrastructure squads to promote industry best practices and ensure the software is resilient enough for our scientists to rely upon. You will also be adding your input into diverse areas such as cloud services, container technologies, authentication, network topology, sharded databases, scalable web services, interfaces to external data sources and APIs.Primary Responsibilities
We are looking for someone with
- Co-ownership of the overall Benevolent cloud architecture.
- Ownership of the company's site reliability goals, formulation of objectives in alignment with high-level organisation strategy.
- Approving the defined targets for SLOs and SLIs. Participation in the negotiations to define SLAs.
- Driving large-scale infrastructure projects to delivery through coordination with engineering and security teams in order to achieve a common goal.
- Incident response management. Ownership of incident response and disaster recovery policies.
- Influencing the direction of infrastructure technology advancements. Designing around challenges associated with large-scale distributed systems and driving the harmonisation of technology support layer to promote reuse across the organisation.
- Conceiving and driving infrastructure solutions to achieve business continuity goals
- Constantly refining processes and working practices to remove obstacles and empower engineering teams to supply our users with ample infrastructure solutions.
- Designing infrastructure solutions and maintaining specification.
Together, we envision a world in which no disease goes untreated. If you are benevolent, curious, want to tackle real world problems and are willing to embrace new ideas, hit that 'apply' button and join us.
- Evidence of creative thinking and problem solving, confidently applying novel strategies to move projects to important decision points quickly and efficiently.
- Excellent oral and written communication skills e.g. can tailor the complexity of communications as and when required, whilst maintaining clarity of communication.
- Ability to work under pressure, manage different projects and deliver to defined timelines.
- Experience successfully leading a Site Reliability, DevOps or engineering team with excellent communication skills and the ability to forge productive relationships and collaborations both internally and externally.
- Excellent understanding of AWS and Kubernetes. Knowledge of scalability challenges associated with containers, distributed systems and large-scale web applications.
- Experience with programming languages(any, bonus points for Python/Java/Go/C++).
- Comfortable with availability out of working hours in the event of a high severity incident.
- Experience with monitoring and alerting solutions(for example Grafana/Prometheus).
- Extensive knowledge of cloud networking architecture, cloud operations, automation and orchestration.
- Good knowledge of network protocols and components such as BGP, TCP, HTTP/S and Load Balancing.