The Service Operations Team (SOC) are responsible for the primary operation & availability of HealthHero's platforms and services. The team are responsible for identifying and resolving issues in production (ideally before they become visible to customers) and working with the wider engineering community to ensure we chase down and mitigate areas of risk. The team is part of the Infrastructure and Data Engineering department and will report to the Director/VP of the department.
As a Tech Lead you will play a crucial part in defining and building this team along with deciding how it interacts with the rest of the business. You will be working alongside other teams developing not only our products but our automation, telemetry and infrastructure.
This is the first role being hired in this space so you'll get the chance to lean on your career's knowledge to set up a world class team and create ways of working to suit. Our SOC will not be behind closed doors - you'll get to interact with the developers, the architects and the security team and play a crucial part in making sure we bake operational smarts into our systems from the start. If this sounds like the sort of fun you like then read on!Key Responsibilities
- As a SOC Tech Lead, you'll be responsible for
- Defining the SOC team, hiring key individuals, setting up its processes and tooling.
- Monitoring our production environments and reacting fast to prevent or reduce customer visible impact.
- Accurate Escalation of incidents when required
- Communication of production issues to key stakeholders, working alongside Service Management
- Troubleshooting, reproducing and mitigating issues in our production environments
- Incident management of high severity issues impacting our sites and services
- Creating automation and tooling to improve our processes
- Supporting service prior to go-live through pre-launch reviews
- Designing and participating in events such as Wargames to test our operational response and Identify areas of weakness in our platforms.
You'll have an advantage if you've
- Team leadership experience
- Incident Management experience
- Strong Troubleshooting, problem-solving and investigative skills
- Experience operating a production environment
- Excellent communication skills
- Experience of working in an agile environment to deliver software
- Experience of Scripting / Automation
- Experience working in a cloud native environment
- Worked with C# .net, Python, Ansible
- Experience of ELK, Splunk, Prometheus, Graphite, Grafana
- Experience of AWS, Azure
- Operated a production Windows or Linux Environment
- Experience of working in an Agile Environment
- Familiar with Jira
Some extra info that's important to us
- Pension scheme
- Access to HealthHero healthcare services
- Medical Cash Plan
- Discount schemes
- Extra Holiday Day for your Birthday
- Free Breakfast and Snacks in office
We welcome applications from all sections of the community as an Equal Opportunities employer. We are also happy to make any reasonable adjustments at any stage of the recruitment process should you need it, please just let us know.
We take your data privacy seriously and commit to processing your data in line with GDPR guidelines. Please see our Privacy Notice detailing how we manage your applicant data. By proceeding through the applicant stage we understand that you are in agreement on how we will manage your data.
We'll need to take background checks relevant to the role, which will include a Right to Work check, employment references and a DBS check.