< Back to search
BT Group • IND-Bengaluru East-RMZ Ecoworld

Site Reliability Engineering Specialist

Employment type:  Full time
Apply now

Job Description

Location:- Bengaluru

Last date of application:- 22-May-2026

About the role

As a Site Reliability Engineer (SRE) within the Network Operations team, BTI International, you will be responsible for ensuring the reliability, resilience and performance of our Global Platforms including Global Fabric. You will collaborate closely with Engineering, Product and ASG teams to embed SRE principles such as automation, observability and proactive incident reduction into day‑to‑day operations. By improving how we monitor, maintain and evolve our services, you will help reduce risk, improve service quality and increase operational efficiency. Through this role, you will support BTI International’s strategy by enabling stable, secure and scalable platforms that support business growth, accelerate delivery of new capabilities, and protect customer experience.

What you will be doing (Role Accountabilities)

  • Provide end-to-end SRE ownership for the Global Fabric service, ensuring high availability, performance, and resilience.
  • Enable safe, automated changes to production through CI/CD pipelines, GitOps practices, and automated testing.
  • Operate, maintain, and continuously improve monitoring and observability using tools such as Dynatrace, Prometheus, and Elasticsearch.
  • Lead complex incident troubleshooting across multiple systems and services.
  • Act as a third-line escalation point, participating in a 24/7 on-call rota.
  • Manage incidents through ServiceNow and track defects and continuous improvements in Jira.
  • Drive automation initiatives using Ansible and scripting to reduce operational toil and improve reliability.
  • Mentor and support L2 engineers, enhancing runbooks, troubleshooting standards, and overall operational readiness.
  • Contribute to PI planning, supporting Agile delivery practices.

What you’ll need to succeed (Skills & Experience)

  • Experience supporting large-scale, high-availability services in an ISP / NaaS / network-centric environment.
  • Experience operating customer‑facing web apps and APIs with strong focus on availability, latency, and error handling.
  • Ability to troubleshoot end‑to‑end CF journeys across UI, APIs, middleware, and downstream fulfilment systems
  • Solid understanding of event‑driven integrations (Kafka/event streams), including impact of lag, message loss, and replay on customer journeys.
  • Experience delivering changes using GitOps and CI/CD pipelines (including release validation and rollback awareness).
  • Experience with designing, implementing, and managing observability solutions using Dynatrace.
  • Experience with observability tooling: Dynatrace, Prometheus, Elasticsearch, plus event/messaging platforms such as Kafka.
  • Strong observability skills for CF services, including journey‑based monitoring and synthetic checks.
  • Knowledge of Infrastructure as Code tools like Terraform or Ansible.
  • Automation experience with Ansible and at least one of Python / Go / Bash.
  • Working knowledge of incident/problem management in ServiceNow and delivery tracking in Jira (Scrum / PI planning).

BT Group’s Behaviours

Customer First

Prioritize customer needs in every decision and action.

Challengers

Challenge the status quo and bring innovative ideas to life.

Committed

Own outcomes and deliver with integrity.

Clear

Communicate openly and simply, ensuring alignment.

Connected

Collaborate across teams to achieve shared goals.

About the role

The Site Reliability Engineering Specialist independently executes activities that help ensures BT is in the best position to deliver the service performance, reliability and availability that internal and external customers expect, through enabling cross-team engineering discussions to achieve scalable, measurable, fault-tolerant, and cost-effective cloud services.

What you’ll be doing

1. Executes the implementation of new software development life cycle automation tools, frameworks, and code pipelines (continuous integration/continuous delivery pipelines whilst executing best practices with a focus on the re-use of application code, demonstrates consistent software delivery practices and produces continuous integration/continuous delivery platform solutions using Amazon Web Services cloud, infrastructure as code (IaC), GitOps, and container technologies
2. Coordinates a diverse team and creates the initial test schedule to deliver all aspects of testing to time, budget and quality targets, ensuring producing outlines of solutions and defining depth of testing required
3. Executes the implementation of automation technologies to ensure repeatability, eliminating toil, reducing mean time to detection and resolution and repair services
4. Proactively identifies and manages risk through regular assessment and diligent execution of controls and mitigations, proactively raising any concerns
5. Leads scale testing to measure, tune and optimise system performance
6. Executes metric/monitoring analysis that creates stability, security, and performance improvements
7. Designs, analyses, develops and troubleshoots highly-distributed large-scale production systems spanning on-prem and cloud-based hosting
8. Executes approaches that scale systems sustainably through mechanisms like automation and evolves systems by pushing for changes that improve reliability and velocity
9. Writes and delivers infrastructure as code software to improve the availability, scalability, latency, and efficiency of services
10. Implements robust monitoring and alerting systems and performs root cause analysis and post-mortems with an eye towards future prevention
11. Inspects queue and support processing to ensure early warning of support issues
12. Executes retrospective and preventive actions after each high severity production incident
13. Analyses complex systems from a reliability and resilience perspective and identifies sources of instability in distributed systems
14. Champions, continuously develops and shares with team knowledge on emerging trends and changes in site reliability engineering best practices and industry standards
15. Mentors other site reliability engineers, helping to improve the team's abilities by acting as a technical resource

Essential Skills / Experience

Desirable Skills / Experience

Our Package

Company benefits

25 (UK, increasing with service) / 21 (India) days annual leave + bank holidays
Adoption leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Bank holiday swaps
Buy or sell annual leave – buy up to 5 days/year pro rata
Carer’s leave – Two weeks paid leave
Cinema discounts
Coaching
Compassionate leave
Complimentary Medical Services
Cycle to work scheme
Employee assistance programme
Employee discounts
Enhanced maternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Enhanced paternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Enhanced pension match/contribution
Enhanced sick pay – 3 months
Faith rooms
In house training
L&D budget – sponsored accreditation available for certain professions
Learning platform – internal and external learning content via Degreed
Learning license – unlimited access
Lunch and learns
Mental health platform access – Silvercloud
Mentoring
Neo-natal leave
Open to job sharing
Open to part time work for some roles
Optional unpaid leave
Private GP service – 24/7 virtual GP access for UK colleagues
Referral bonus
Returnship
Salary sacrifice
Share options
Shared parental leave
Travel loan
Volunteer days – 3 volunteer days per year
Reservist leave
Fertility treatment leave
Pregnancy loss leave
Pregnancy support
Fertility treatment leave
Pregnancy loss leave
Pregnancy support
On-site catering
On-site barista
On-site shower
Modern office
Collaboration spaces
Private booths
On-site wellness room
Open to part-time employees
Open to compressed hours

Working at BT Group

Company employees:

100,000 across BT Group (24,000 at BT Business)

Gender diversity (m:f):

74.3:25.7 (BT Group)

Hiring in countries

Brazil

Colombia

Hungary

India

Ireland

Singapore

United Kingdom

Office Locations

Awards & Accreditations

2nd - Best Workplace Culture

2nd - Best Workplace Culture

Flexa awards 2026
3rd – Most loved - Large companies

3rd – Most loved - Large companies

Flexa awards 2026
Most Family Friendly Company

Top 10 - Most Family Friendly Company

Flexa awards 2025
Best Career Progression

Top 10 - Best Career Progression

Flexa awards 2025

Other jobs you might like