
Job Description
Recruiter: Viktoria Pálfi-Vincze
Hiring Manager: Laura O'Connor
Career Grade: D
About BT
BT Group is the UK’s leading communications group and the holding company behind some of the country’s most recognised brands – including BT, EE, Openreach and Plusnet. Our purpose is as simple as it is ambitious: we connect for good. Our customers include consumers, small, medium and large businesses, public sector organisations and other communications providers.
BT Group’s role is about setting direction, unlocking value and creating the conditions for our brands and businesses to thrive.
Having come through the most capital-intensive phase of our fibre investment, our focus now is on what comes next – simplifying how we operate, using technology and AI to work smarter, and organising ourselves to serve customers better and grow sustainably. Group teams shape strategy, policy, brand, capital allocation and transformation, helping the whole organisation perform at its best.
We have a singular culture that unites all our people: we are customer-first challengers, who are committed, clear and connected. These behaviours unite us as one team to deliver for our colleagues, our customers, our stakeholders and the country. Joining BT Group means working at the heart of a business that matters to the UK, with the opportunity to shape decisions, influence outcomes and help set the future course of one of the country’s most important companies.
About the role
As a Site Reliability Engineer (SRE) within the Network Operations team, BTI International, you will be responsible for ensuring the reliability, resilience and performance of our Global Platforms including Global Fabric. You will collaborate closely with Engineering, Product and Service teams to embed SRE principles such as automation, observability and proactive incident reduction into day to day operations. By improving how we monitor, maintain and evolve our services, you will help reduce risk, improve service quality and increase operational efficiency. Through this role, you will support BTI International’s strategy by enabling stable, secure and scalable platforms that support business growth, accelerate delivery of new capabilities, and protect customer experience.
What you will be doing (Role Accountabilities)
• Own the operational reliability, performance and resilience of the Global Fabric NaaS platform.
• Support and troubleshoot microservices, APIs and integrations across the NaaS ecosystem.
• Diagnose and resolve production issues across Kubernetes-hosted applications, Linux systems, networking, Kafka, APIs and service integrations.
• Support safe, automated change into production using CI/CD, GitOps, and automated testing.
• Improve observability, monitoring and traceability across the platform using Dynatrace, Prometheus, Grafana, Elasticsearch and Kafka.
• Support BT’s move towards end-to-end tracing and service traceability, helping implement and improve synthetic monitoring, tracing and service flow visibility.
• Participate in major incident resolution, root cause analysis and post-incident improvement activities.
• Manage incidents, problems and changes through ServiceNow and track defects and improvements in Jira.
• Drive automation through Ansible, Python, Bash or similar tooling to reduce manual effort and operational risk.
• Mentor and support L2 engineers by improving troubleshooting practices, runbooks and operational readiness.
• Build strong knowledge of the end-to-end customer journey and ensure operational decisions are aligned to customer impact.
What you’ll need to succeed (Skills & Experience)
Must have:
• Strong Linux and system administration experience, including server and compute management.
• Experience deploying, supporting and troubleshooting containerised applications in Kubernetes.
• Experience using monitoring tools such as Dynatrace, Prometheus, Grafana, Elasticsearch and Kafka.
• Experience supporting large-scale, high-availability services in an ISP, telecom, NaaS or network-centric environment.
• Experience with CI/CD, GitOps and safe production deployments.
• Experience with scripting and automation using Python, Bash, Ansible or similar.
• Growth Mindset: Self-driven attitude towards learning new skills and aiding the development of others.
Desired:
• In-depth knowledge of network protocols, including BGP, IS-IS and MPLS.
• Understanding of synthetic monitoring, telemetry and end-to-end service visibility.
• Experience of resilience, disaster recovery, chaos engineering or high availability testing.
• Ability to manage incidents through ServiceNow, track defects and continuous improvements in Jira.
BT Group’s Behaviours
Customer First: Prioritize customer needs in every decision and action.
Challengers: Challenge the status quo and bring innovative ideas to life.
Committed: Own outcomes and deliver with integrity.
Clear: Communicate openly and simply, ensuring alignment.
Connected: Collaborate across teams to achieve shared goals.
“We believe in open conversations. If selected, salary details will be shared with you ahead of your interview, so you have clarity from the start.”
At BT International, our purpose is to keep the world connected. As part of BT, we build on almost 180 years of innovation and expertise to deliver secure connectivity and digital services to some of the world’s leading multinational businesses and organisations. Our customers trust us to safeguard their data, drive their digital transformation and keep their businesses running. With colleagues on the ground across the world and supporting customers wherever they need to operate, BT International offers a truly global experience. Whether it’s about providing cloud connectivity, helping organisations collaborate, or enabling innovation in cybersecurity and digital services, you’ll be part of a team that shapes how businesses succeed in a world that is being transformed by AI. If you have the drive and ambition to make an impact on a global stage, BT International is where it happens.
Company benefits
Working at BT Group
Company employees:
Gender diversity (m:f):
Hiring in countries
Hungary
India
Malaysia
Singapore
United Kingdom
Office Locations
Other jobs you might like
Site Reliability Engineer
Malaysia, Selangor, Petaling Jaya, 47400 | Malaysia
#3 BEST WORKPLACE CULTURESite Reliability Engineer – Gloucester – NS West
Gloucester, United Kingdom
DevSecOps Specialist
IND-Bengaluru-Pritech
#2 BEST WORKPLACE CULTURE

