Site Reliability Engineer
CV-Library is one of the leading job boards in the UK, attracting over 10 million monthly visits and boasting a CV database of over 16 million UK workers. This is why we are the job board of choice for over 10,000 customers who are looking to hire the best talent for their business. Our U.S brand Resume-Library is one of the fastest growing job boards in North America and we have ambitious plans for further global expansion.
We are looking for a DevOps / Site Reliability Engineer to join an existing highly skilled team to deliver in a DevOps environment supporting production applications, back-office services, cloud services, platform improvements, and acting as both advisor and coach to other team members. You will have experience of diagnosing and fault-finding incidents using data insights and liaising closely with the delivery teams reporting progress, gathering data, intelligence and information as requested. Within this role you will be a key member of our team and will help scope the on-going technical strategy across both CV-Library and Resume-Library. We are looking for someone to uphold and encourage best practice as well as pro-active solutions focused approach to DevOps.Key responsibilities
Skills and Experience Essential:
- Responsible for the performance and reliability of the company's global online platforms. Working within the Technology Team, troubleshooting issues with services via proactive/reactive monitoring, alerts and logging, service requests communicated via Jira, email, Sprint meetings and Stand Ups
- Enhancing existing service's tech stack/configurations to improve site performance, reduce issues through forensic analysis and be responsible for availability management, latency, efficiency, change management, monitoring, emergency response, and capacity planning
- Record data and manage issues with a view to participation in reviews and Blameless Post-Mortems
- Explore and deliver on opportunities to implement automation and scripting of services, environments and toolsets
- Liaise closely with the application Developers, Sprint Teams and the Development Managers reporting progress, gathering data, readings and information as requested
- Design, implement, calibrate and validate to company procedures and processes alongside routine service, emergency service and product updates as required
- Create a bridge between Development and Operations teams by applying an 'as-a-service' mindset to system administration, management and build topics. Gain exposure to systems in both staging and production, as well as all technical teams. Take part in work with software development, support, IT operations and on-call duties.
- Be an advocate for change with an innovative and Growth Mindset, be an engaging collaborative member of the Technology Team and actively support your colleagues in Operations and the wider team.
Desirable (Using or Supporting):
- Infrastructure-As-Code - Terraform or similar
- Monitoring platforms - ELK, Grafana, Prometheus or similar
- Agile / Jira (Scrum or Kanban)
- Understand mechanics of high-traffic high-availability online websites and related back-office services to support and evolve pragmatic solutions
- Able to explain technical details to non-technical stakeholders.
- Cloud Native Computing Foundation (partners)
- Google Cloud Platform
- Observability/APM Platforms (preferably New Relic)
- PHP / Java / Go Lang / Perl / Python