
Site Reliability Engineering Manager
Job Description
About the role
The Site Reliability Engineering Manager manages a team within the site reliability engineering organisation ensuring BT delivers the service performance, reliability and availability that internal and external customers expect, through contributing in cross-functional engineering discussions to achieve scalable, measurable, fault-tolerant, and cost-effective cloud services.
What you’ll be doing
1. Coordinates teams through the implementation of new software development life cycle automation tools, frameworks, and code pipelines (continuous integration/continuous delivery pipelines), ensures teams follow best practices with a focus on the re-use of application code, and demonstrates consistent software delivery practices and produces continuous integration/continuous delivery platform solutions using cloud, infrastructure as code and container technologies
2. Collaborates with engineering leadership in supporting the development of the architectures and practices that should be adopted in order to deliver on engineering and operational goals
3. Provides technical support to product teams to optimise availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
4. Solves problems relating to mission-critical services and builds automation to prevent problem recurrence with the goal of automated response to all non-exceptional service conditions
5. Executes new builds of infrastructure tooling that improves reliability across the entire product surface area, dealing with massive distributed scale
6. Coordinates the work with development teams during design phase, to build and perform infrastructure upgrades to support applications availability and reliability
7. Manages the current-state solution portfolio to identify deficiencies through aging of the technologies used by the application, or misalignment with business requirements
8. Leads a team in the implementation of approaches that scale systems sustainably through mechanisms like automation and evolves systems by pushing for changes that improve reliability and velocity
9. Oversees the delivery of infrastructure as code software to improve the availability, scalability, latency, and efficiency of services
10. Oversees the implementation of robust monitoring and alerting systems
11. Manages the queue and support processing to ensure early warning of support issues
12. Owns and leads retrospective and preventive actions after each high severity production incident
13. Champions, continuously develops and shares with team knowledge on emerging trends and changes in site reliability engineering best practices and industry standards
14. Coaches talent, and manages others, to develop capabilities and ensure performance through upskilling, development and recruitment
15. Implement ways to improve working processes within the area of site reliability engineering responsibility
Essential Skills / Experience
Desirable Skills / Experience
Our Package
Company benefits
Working at BT Group
Company employees:
Gender diversity (m:f):
Hiring in countries
Brazil
Hungary
India
Ireland
United Kingdom
Office Locations
Other jobs you might like
Senior Manager - Facility Management
Thailand, Bangkok, Samutprakan, 10540 | Thailand
#3 BEST WORKPLACE CULTURESRE Chapter Lead
Athens, Attiki, Greece
#1 MOST LOVED - ENTERPRISE COMPANIES

