This job has expired

Site Reliability Engineer - Public Cloud

Employer
J.P.Morgan
Location
UK
Salary
Competitive
Closing date
23 Feb 2021

View more

Sector
Technology & New Media
Contract Type
Permanent
You need to sign in or create an account to save a job.

Job Details

This role requires a wide variety of strengths and capabilities, including:
  • Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
  • Mastery of application, data and infrastructure architecture disciplines
  • Command of architecture, design and business processes Keen understanding of financial control and budget management
  • Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
  • Hands on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving tens to millions of end users.
  • Prior experience in large scale internet companies/technologies, where uptime and continuous availability was core to the business.
  • Work with Architecture to design reusable patterns to deploy to applications, provide governance around adoption, and influence application development teams on roadmaps and designs.
  • Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt.
  • Apply standards of cloud compliance to application design to achieve reliability
  • Understanding of Networking and cloud technologies, for example Security, Load Balancing, Network routing protocols.
Responsibilities:
  • Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
  • Provides failure analysis / root cause analysis when required
  • Provides support to develop & improve the quality of technical engineering documentation
  • Provides support to drive the maturity of the software development lifecycle
  • Provides quality control of engineering deliverables
  • Provides technical consultation to product management
  • Performs deployment, administration, management, configuration, testing, and integration tasks related to the AI/ML platforms in cloud environment
  • Helps to develop new cloud engineering strategies and implementations for the firm
  • Champion a DevOps model so that services are automated and elastic across all platforms
  • Helps on coaching and mentoring less experienced team members.
  • Writes operation documentation and knowledge base of known issues with solutions
  • Participates in 24x7 SRE on-call rotations and escalation workflows.
Qualifications:
  • Bachelor's degree in Computer Science, Information Technology, or equivalent technical field
  • Enterprise Cloud infrastructure experience (AWS, Azure, GCP) in a mission critical environment
  • In-Depth OS experience (RHEL, Ubuntu, Windows Server) with strong debugging, troubleshooting, and problem-solving skills
  • Experience in site reliability engineering in one of the following languages: Python, Java, PowerShell, shell scripting or GO
  • Hand-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, Elasticsearch, Grafana
  • Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins.
  • Deep knowledge of Internet protocols and web services technologies such as HTTP, DNS, TCP/UDP, SOAP, JSON and REST
  • Good understanding of networking protocols and cybersecurity best practices in cloud environment
  • AWS or GCP certification is highly desirable
You need to sign in or create an account to save a job.

Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert