Site Reliability Engineering Manager
- Employee status: Permanent
- Closing date: Ongoing
- Reference number: 43115
We’re Sky, Europe’s biggest entertainment brand. Think top-quality shows. Breaking news. Innovative tech. Must-have products. Careers here mean the freedom and support you need to make an impact – pushing boundaries, creating solutions, hitting targets. And as part of our close-knit team, you’ll enjoy plenty of benefits. Plus, experiences you’ll only find at Sky.
This role is an exciting opportunity to join us and work within our Technology Team
Sky Q, a generous pension and private health care. Access to over 12,000 LinkedIn Learning courses to support your development. And if that’s not enough, our award-winning Osterley campus boasts six subsidised restaurants, a cinema, gym, and much more.
To find out more about working with us, search #LifeatSky on LinkedIn, Twitter or Instagram.
As a Site Reliability Engineering Manager, you will be responsible for a team of engineers who deeply understand distributed systems operationally and quality engineers who are trying to figure out how it breaks.
- Collaborate with development teams to provide a path to production that support development objectives.
- Delivering overall path to production frameworks that form a standard OTT platform into development teams.
- Implement infrastructure as code, monitoring as code, everything as code.
- Track & implement corrective actions around achieving 99.995% availability.
- Lead through influence to improve in a widely distributed team
- Collaborate with Architects and Software Engineers to improve the resilience of Sky OTT systems
- Conduct formal operational readiness reviews of proposed software designs, controls, and test plans.
- Perform incident analysis and provide recommendation, including pushing for delivery
- Experience in one or more technical domain as systems engineer with a development background; be versed in systems architecture.
- Demonstrated breadth of experience by having a background in technology architecture, design, and development – i.e. role progression. Resiliency and Wide Knowledge Skills - must bring in an understanding of technology and systems concepts and methods.
- Strong background in System Administration/architecture and in Configuration and management of large scale platforms. (Virtualization, Cloud, Unix, Java, Puppet, No SQL Databases, Kubernetes, Docker)
- Strong background in monitoring and logging of large scale platforms. (Nagios, Prometheus, Splunk, Icinga, etc.)
- Proven experience of implementing change to enforce high availability on large scale platforms. (E.g.: Circuit breakers, Fail Fast/Silent/Stubbed Fallback etc.)
- Strong Technical Analytical and good communication skills
- Understanding of Agile and deep understanding of Dev Ops Practice within. (Continuous Delivery)
So, what are you waiting for? Apply now for a chance to forge your own career path and be brilliant as part of a bright, talented team.
Just so you know: if your application is successful, your appointment will be subject to receiving a positive outcome from your criminal record check.
We’re happy to discuss flexible working.
It’s our people that make Sky Europe’s leading entertainment company. That’s why we work hard to be an inclusive employer, so everyone at Sky can be their best.
A job you love to talk about