< Back to search
BT Group • New Bailey, Manchester, United Kingdom

Lead - Capacity & Automation (SRE)

9.2

/10

Transparency ranking
Apply now

Job Description

Why this job matters

This role is critical to the success of our Private Cloud platform. As the single accountable owner for capacity management, you will ensure that our VMware-based infrastructure is reliable, scalable, and cost-efficient—enabling the business to deliver programmes without risk of capacity-related delays. By applying Agile product ownership and Site Reliability Engineering (SRE) principles, you will transform capacity management into a proactive, data-driven capability that continuously evolves to meet business demand.

You will define and implement forecasting models, automation, and guardrails that prevent saturation and optimise resource utilisation. Your work will directly impact platform reliability, programme delivery, and financial efficiency, making this role a cornerstone of our technology strategy. Through telemetry, automation, and governance, you will provide the insights and controls that keep private cloud (EC.3) resilient, cost-effective, and ready for future growth.

This role is hybrid (3 days in the office) in either Birmingham / London / Manchester

What you’ll be doing

  • Own the Private Cloud “EC.3” Capacity Management Platform – act as the single accountable owner for capacity planning, forecasting, modelling, and optimisation across the VMware-based Enterprise Cloud v3 environment.
  • Define and Deliver the Capacity Roadmap – translate business demand and programme milestones into a prioritised backlog of features and automation, using Agile delivery practices.
  • Implement SRE Guardrails – establish SLIs, SLOs, and error budgets for infrastructure -related reliability; ensure proactive risk management
  • Develop Forecasting Models – build accurate short-, medium-, and long-term capacity forecasts using telemetry and scenario analysis to prevent saturation and ensure headroom.
  • Automate Capacity Workflows – reduce manual toil by creating scripts, policies, and integrations for rightsizing, placement, and quota enforcement using PowerCLI, APIs, and IaC.
  • Maintain Real-Time Telemetry & Dashboards – provide a single source of truth for utilisation, trends, and optimisation opportunities through VMware Aria Operations (vROps) and reporting tools.
  • Optimise Cost and Efficiency – align with FinOps principles to deliver show back/chargeback reporting, identify waste, and implement cost-saving measures without compromising reliability.
  • Integrate with ITSM & Governance – ensure ServiceNow CMDB accuracy, automate request fulfilment, and maintain compliance with capacity policies and audit requirements.
  • Collaborate Across Teams – work closely with Architecture, Programme Delivery, Finance, and Operations to align capacity decisions with strategic objectives and risk appetite.
  • Continuously Improve – evolve the capacity management capability through iterative enhancements, stakeholder feedback, and adoption of emerging best practices.

Leadership Accountabilities

  • Vision & Strategy – Define and communicate the long-term vision for capacity management on EC.3, ensuring alignment with business objectives and technology strategy.
  • Ownership & Accountability – Act as the single point of accountability for capacity planning, forecasting, and optimisation across the VMware platform.
  • Influence & Stakeholder Engagement – Build strong relationships with senior stakeholders, program leads, and cross-functional teams to drive decisions and secure buy-in.
  • Agile Leadership – Champion Agile ways of working, ensuring backlog prioritisation, iterative delivery, and continuous improvement of the capacity capability.
  • Reliability Governance – Embed SRE principles into leadership decisions, balancing innovation with risk management through SLIs, SLOs, and error budgets.
  • Financial Stewardship – Lead cost optimisation initiatives aligned with FinOps principles, ensuring efficient use of resources and transparent reporting.
  • Team Enablement – Mentor and guide engineers and analysts, fostering a culture of automation, data-driven decision-making, and operational excellence.
  • Change Leadership – Drive adoption of new processes, tools, and automation across teams, ensuring smooth transitions and minimal disruption.
  • Executive Communication – Provide clear, concise updates on capacity health, risks, and roadmap progress to senior leadership and governance boards.
  • Continuous Improvement – Lead retrospectives and postmortems to identify systemic improvements and embed lessons learned into future planning.

Key Decisions

  • Capacity Headroom Policy – Define minimum thresholds for CPU, memory, and storage across clusters to ensure reliability and performance.
  • Forecasting Approach – Select and implement the models and tools used for short-, medium-, and long-term capacity planning.
  • Automation Priorities – Decide which manual processes to automate first (e.g., rightsizing, placement, quota enforcement) to reduce toil and improve efficiency.
  • SLO & Error Budget Targets – Set reliability objectives for capacity-related metrics and determine acceptable risk levels for change management.
  • Optimisation Strategy – Choose cost-saving measures (e.g., rightsizing, decommissioning, reserved capacity) while balancing performance and resilience.
  • Tooling & Integration Choices – Determine which platforms (e.g., VMware Aria Operations, ServiceNow, Power BI) and scripts will form the core of the capacity management capability.
  • Governance & Compliance Controls – Establish policies for capacity requests, approvals, and audit readiness.
  • Reporting & Communication Cadence – Decide how often and in what format capacity health, risks, and forecasts are shared with stakeholders.
  • Change Freeze & Risk Mitigation – Make calls on when to pause non-essential changes based on capacity risk or error budget breaches.
  • Continuous Improvement Roadmap – Prioritise enhancements to forecasting accuracy, automation coverage, and stakeholder experience.

Skills & Experience Required for the Role

Essential:

  • Deep VMware Expertise – hands-on experience with vSphere, vCenter, vSAN, NSX-T, and VMware Aria Operations (vROps) for capacity analytics and optimisation.
  • Capacity Planning & Forecasting – ability to model demand, headroom, and growth scenarios using telemetry and data-driven methods.
  • Automation & Scripting – proficiency in PowerCLI, Python, and API integrations to automate rightsizing, placement, and quota enforcement.
  • Agile Delivery Skills – experience managing backlogs, writing user stories, and delivering incremental improvements through sprints and ceremonies.
  • SRE Practices – strong understanding of SLIs, SLOs, error budgets, and reliability engineering principles applied to infrastructure capacity.
  • Observability & Analytics – ability to design dashboards and alerts for utilisation, saturation, and optimisation opportunities.
  • FinOps Awareness – knowledge of cost optimisation, show back/chargeback models, and unit economics for infrastructure services.
  • Governance & Compliance – familiarity with ITSM tools (e.g., ServiceNow), CMDB data integrity, and audit-ready processes.
  • Stakeholder Engagement – excellent communication and influencing skills to align capacity decisions with business priorities.
  • Continuous Improvement Mindset – proactive approach to evolving processes, reducing toil, and adopting emerging best practices.

Experience you’d be expected to have

  • Proven track record in capacity management for large-scale VMware environments (vSphere, vCenter, vSAN, NSX-T).
  • Hands-on experience with VMware Aria Operations (vROps) or similar tools for capacity analytics, forecasting, and optimisation.
  • Automation and scripting expertise using PowerCLI, Python, and API integrations to reduce manual toil and enforce policies.
  • Agile delivery experience, including backlog management, sprint planning, and stakeholder engagement for platform capabilities.
  • Site Reliability Engineering (SRE) practices applied to infrastructure—defining SLIs/SLOs, managing error budgets, and improving reliability.
  • Performance engineering knowledge, including CPU/memory/storage utilisation, contention analysis, and headroom policies.
  • Cost optimisation and FinOps alignment, with experience in show back/chargeback models and unit economics for infrastructure services.
  • ITSM and governance experience, particularly ServiceNow CMDB integration and compliance with audit requirements.
  • Cross-functional collaboration, working with architecture, programme delivery, finance, and operations teams to align capacity decisions with strategic objectives.
  • Continuous improvement mindset, with a history of evolving processes, implementing automation, and driving operational excellence.

Benefits

  • On target 10% on target bonus​
  • BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%​
  • From January 2025, equal family leave: receive 18 weeks at full pay, 8 weeks at half pay and 26 weeks at the statutory rate. It’s for all parents, no matter how your family is made up.​
  • Enhanced women’s health support: including help with menopause symptoms, cancer screenings, period care and more.​
  • 25 days annual leave (not including bank holidays), increasing with service​
  • 24/7 private virtual GP appointments for UK colleagues​
  • 2 weeks carer’s leave ​
  • World-class training and development opportunities​
  • Option to join BT Shares Saving schemes.

About us

BT Group was the world’s first telco and our heritage in the sector is unrivalled. As home to several of the UK’s most recognised and cherished brands – BT, EE, Openreach and Plusnet, we have always played a critical role in creating the future, and we have reached an inflection point in the transformation of our business.

Over the next two years, we will complete the UK’s largest and most successful digital infrastructure project – connecting more than 25 million premises to full fibre broadband. Together with our heavy investment in 5G, we play a central role in revolutionising how people connect with each other.

While we are through the most capital-intensive phase of our fibre investment, meaning we can reward our shareholders for their commitment and patience, we are absolutely focused on how we organise ourselves in the best way to serve our customers in the years to come. This includes radical simplification of systems, structures, and processes on a huge scale. Together with our application of AI and technology, we are on a path to creating the UK’s best telco, reimagining the customer experience and relationship with one of this country’s biggest infrastructure companies.

Change on the scale we will all experience in the coming years is unprecedented. BT Group is committed to being the driving force behind improving connectivity for millions and there has never been a more exciting time to join a company and leadership team with the skills, experience, creativity, and passion to take this company into a new era.

A FEW POINTS TO NOTE:

Although these roles are listed as full-time, if you’re a job share partnership, work reduced hours, or any other way of working flexibly, please still get in touch.

We will also offer reasonable adjustments for the selection process if required, so please do not hesitate to inform us.

DON'T MEET EVERY SINGLE REQUIREMENT?

Studies have shown that women and people who are disabled, LGBTQ+, neurodiverse or from ethnic minority backgrounds are less likely to apply for jobs unless they meet every single qualification and criteria. We're committed to building a diverse, inclusive, and authentic workplace where everyone can be their best, so if you're excited about this role but your past experience doesn't align perfectly with every requirement on the Job Description, please apply anyway - you may just be the right candidate for this or other roles in our wider team.

Company benefits

Enhanced maternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Enhanced paternity leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
25 (UK, increasing with service) / 21 (India) days annual leave + bank holidays
Carer’s leave – Two weeks paid leave
Open to job sharing
Open to part time work for some roles
Private GP service – 24/7 virtual GP access for UK colleagues
Mental health platform access – Silvercloud
Adoption leave – 18 weeks full pay, 8 weeks half pay, 6 months statutory
Shared parental leave
Buy or sell annual leave – buy up to 5 days/year pro rata
Employee assistance programme
Bank holiday swaps
Share options
Compassionate leave
Faith rooms
Salary sacrifice
Employee discounts
Cinema discounts
Enhanced sick pay – 3 months
Optional unpaid leave
Returnship
Complimentary Medical Services
Travel loan
Enhanced pension match/contribution
Volunteer days – 3 volunteer days per year
Lunch and learns
Cycle to work scheme
In house training
Mentoring
LinkedIn learning license – unlimited access
Learning platform – internal and external learning content via Degreed
L&D budget – sponsored accreditation available for certain professions
Coaching
Referral bonus
Neo-natal leave

Working at BT Group

Company employees:

100,000 across BT Group (24,000 at BT Business)

Gender diversity (m:f):

74.3:25.7 (BT Group)

Hiring in countries

Brazil

Canada

Hungary

India

Ireland

Spain

United Kingdom

United States

Office Locations

Awards & Accreditations

Family Friendly

Family Friendly

Flexa awards 2025
Career Progression

Career Progression

Flexa awards 2025

Other jobs you might like