< Back to search
9.4

/10

Transparency ranking
Apply now

Job Description

Why this job matters

The Site Reliability Engineering Specialist plays a critical role in safeguarding BT’s ability to deliver exceptional service performance, reliability, and availability across its digital platforms. In today’s fast-paced, cloud-driven AI environment, customers expect seamless experiences, and this position ensures those expectations are met by driving scalable, robust, resilient, and cost-effective solutions. By enabling cross-team collaboration and implementing automation, monitoring, and resilience strategies, the specialist not only minimizes downtime and operational risk but also accelerates innovation and system evolution. This role is pivotal in maintaining BT’s reputation for reliability while empowering the business to adapt quickly to emerging technologies and deliver consistent value to customers worldwide.

What you’ll be doing

  • Implement and operate CI/CD and SDLC automation using cloud services, infrastructure as code (IaC), GitOps patterns and containers, following established engineering and security practices.
  • Contribute to test planning and execution with delivery and QA to meet quality and time goals; help define practical coverage and schedules.
  • Automate to reduce toil and MTTR (mean time to resolve)—scripts, runbooks and guard railed tasks that remove repetitive work and improve recovery speed.
  • Participate in Tier 2/3 incident response: diagnose, mitigate and recover; capture learnings and drive preventive follow ups.
  • Implement and tune observability (metrics, logs, traces, dashboards, alerts) to improve signal quality and reduce noise.
  • Apply SRE fundamentals with teams: define/maintain SLIs and SLOs with error budgets; propose data driven reliability improvements.
  • Harden release reliability: keep pipelines stable, safe and reliable; identify configuration drift and remediate quickly.
  • Assist on call readiness: runbook stewardship, change/rollback safety, participation in DR/failover exercises and game days.
  • Identify reliability risks across services and environments; raise issues early and assist mitigations and control adoption.
  • Collaborate with developers, platform, operations and partners; document clearly and assist peer learning.

Skills required for the job

Core SRE & Engineering Skills

  • Strong expertise in end to end observability and monitoring platforms (e.g., Dynatrace) to grasp system health, performance trends, and reliability of business critical applications.
  • Proficiency in one or more programming languages (e.g., Java, Python) with the ability to write production quality automation and tooling.
  • Hands on experience with cloud platforms (AWS, Azure, or GCP) and operating distributed systems in cloud and hybrid environments.
  • Firm Grasp of software architecture, design patterns, and microservices based systems.
  • Practical experience with CI/CD pipelines, DevOps practices, and continuous testing to Assist fast, reliable delivery.
  • Strong capability in infrastructure as code and pipeline management, enabling consistancy, scalable, and secure deployments.

Reliability, Operations & Continuous Improvement

  • Proven ability to apply Site Reliability Engineering principles, including automation, toil reduction, incident learning, and reliability driven system improvements.
  • Experience analysing complex, distributed systems to identify performance, resilience, and stability issues.
  • Ability to assist 24x7 operational environments, working effectively with stakeholders & backend teams and managed service partners during priority incidents.
  • Strong analytical, reporting, and presentation skills, enabling clear communication of operational insights, risks, and improvement opportunities.
  • Demonstrated mindset for business process improvement, using data and automation to drive efficiency and reliability gains.
  • Adaptability to evolving industry trends and emerging technologies, with a continuous learning and growth mindset.

AI Driven Observability & AIOps

  • Understanding of AIOps fundamentals, including cross domain telemetry ingestion, event correlation, topology and context modelling, and remediation augmentation.
  • Experience with AI assisted and agentic observability, using intelligent techniques to detect anomalies, correlate signals, and accelerate incident resolution.
  • Capability in AI driven alerting and noise reduction, designing contextual, business impact aware alerts and leveraging machine learning to prioritise and reduce alert fatigue.

Nice to have

  • AI‑assisted incident workflows: LLM‑generated summaries/timelines or suggestion prompts in collaboration tools; context‑aware runbooks under human‑in‑the‑loop controls.
  • AIOps capabilities: event correlation, dynamic topology/context modelling, impact‑aware alerting and alert noise reduction features in modern observability platforms.
  • Chaos engineering: exposure to controlled fault injection with tools like Gremlin/Litmus/Chaos Mesh; translating findings into tangible reliability improvements.
  • ML Ops: model drift/freshness concepts and high‑level SLIs/SLOs for ML services; basic approaches to monitoring model health signals.

Our leadership standards

Looking in:
Leading inclusively and Safely
I inspire and build trust through self-awareness, honesty and integrity.
Owning outcomes
I take the right decisions that benefit the broader organisation.

Looking out:
Delivering for the customer
I execute brilliantly on clear priorities that add value to our customers and the wider business.
Commercially savvy
I demonstrate strong commercial focus, bringing an external perspective to decision-making.

Looking to the future:
Growth mindset
I experiment and identify opportunities for growth for both myself and the organisation.
Building for the future
I build diverse future-ready teams where all individuals can be at their best.

About us

BT Group was the world’s first telco and our heritage in the sector is unrivalled. As home to several of the UK’s most recognised and cherished brands – BT, EE, Openreach and Plusnet, we have always played a critical role in creating the future, and we have reached an inflection point in the transformation of our business.

Over the next two years, we will complete the UK’s largest and most successful digital infrastructure project – connecting more than 25 million premises to full fibre broadband. Together with our heavy investment in 5G, we play a central role in revolutionising how people connect with each other.

While we are through the most capital-intensive phase of our fibre investment, meaning we can reward our shareholders for their commitment and patience, we are absolutely focused on how we organise ourselves in the best way to serve our customers in the years to come. This includes radical simplification of systems, structures, and processes on a huge scale. Together with our application of AI and technology, we are on a path to creating the UK’s best telco, reimagining the customer experience and relationship with one of this country’s biggest infrastructure companies.

Change on the scale we will all experience in the coming years is unprecedented. BT Group is committed to being the driving force behind improving connectivity for millions and there has never been a more exciting time to join a company and leadership team with the skills, experience, creativity, and passion to take this company into a new era.

A FEW POINTS TO NOTE:

Although these roles are listed as full-time, if you’re a job share partnership, work reduced hours, or any other way of working flexibly, please still get in touch.

We will also offer reasonable adjustments for the selection process if required, so please do not hesitate to inform us.

DON'T MEET EVERY SINGLE REQUIREMENT?

Studies have shown that women and people who are disabled, LGBTQ+, neurodiverse or from ethnic minority backgrounds are less likely to apply for jobs unless they meet every single qualification and criteria. We're committed to building a diverse, inclusive, and authentic workplace where everyone can be their best, so if you're excited about this role but your past experience doesn't align perfectly with every requirement on the Job Description, please apply anyway - you may just be the right candidate for this or other roles in our wider team.

Company benefits

25 (UK, increasing with service) / 21 (India) days annual leave + bank holidays
Adoption leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Bank holiday swaps
Buy or sell annual leave – buy up to 5 days/year pro rata
Carer’s leave – Two weeks paid leave
Cinema discounts
Coaching
Compassionate leave
Complimentary Medical Services
Cycle to work scheme
Employee assistance programme
Employee discounts
Enhanced maternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Enhanced paternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Enhanced pension match/contribution
Enhanced sick pay – 3 months
Faith rooms
In house training
L&D budget – sponsored accreditation available for certain professions
Learning platform – internal and external learning content via Degreed
LinkedIn learning license – unlimited access
Lunch and learns
Mental health platform access – Silvercloud
Mentoring
Neo-natal leave
Open to job sharing
Open to part time work for some roles
Optional unpaid leave
Private GP service – 24/7 virtual GP access for UK colleagues
Referral bonus
Returnship
Salary sacrifice
Share options
Shared parental leave
Travel loan
Volunteer days – 3 volunteer days per year
Reservist leave
Fertility treatment leave
Pregnancy loss leave
Pregnancy support
Fertility treatment leave
Pregnancy loss leave
Pregnancy support
On-site catering
On-site barista
On-site shower
Modern office
Collaboration spaces
Private booths
On-site wellness room

Working at BT Group

Company employees:

100,000 across BT Group (24,000 at BT Business)

Gender diversity (m:f):

74.3:25.7 (BT Group)

Hiring in countries

Brazil

Canada

Hong Kong

Hungary

India

Poland

Singapore

South Korea

Spain

United Kingdom

Office Locations

Awards & Accreditations

Family Friendly

Family Friendly

Flexa awards 2025
Career Progression

Career Progression

Flexa awards 2025

Other jobs you might like