
Resilience Engineer
/10
Job Description
Join Us
By connecting people, places, and things, Vodafone IoT enables organisations to thrive in the digital world. Leveraging our expertise in connectivity, our advanced IoT platform, and our extensive global reach, we deliver the results necessary for our customers' progress and success. We support businesses of all sizes and sectors in their efforts to connect for a better future.
The Vodafone Internet of Things (IoT) suite of products and services is specifically designed to meet the demands of emerging business verticals. Our connection base has experienced a 20% year-over-year growth, reaching over 200 million connections by the end of the financial year 2025. Vodafone IoT maintains its leadership as a ten-time consecutive leader in the IoT Connectivity Gartner Magic Quadrant. To address the technological needs of IoT, Vodafone has developed an industry-leading IoT Connectivity Management Platform, targeting key strategic growth opportunities to meet the global requirements of IoT customers.
Vodafone has also carved out the IoT Connectivity business to secure additional external investment and maintain our leading position in the industry through the following.
- Continue accelerating and enhancing our Platform as a Service for Vodafone customers on footprint.
- Introduce service propositions in markets beyond Vodafone's current footprint.
- Address long tail lower volume segment through digital self-service platform globally.
We are seeking a senior Resilience Engineer to own and evolve the stability, availability, and recoverability of our IoT platforms. This role operates at the intersection of system architecture, reliability engineering, and operational excellence, with end-to-end accountability for designing resilience into our services. You will define and govern resilience strategies, influence platform architecture, and partner across product, infrastructure, and engineering teams to ensure our systems continue to perform under failure, scale, and unexpected disruption.
What you’ll do
- Developing and governing resilience strategies across system architecture, deployment, monitoring, and incident response;
- Defining and tracking stability KPIs (e.g., MTTD, MTTR, error budgets), partnering with performance and operations teams to meet or exceed targets;
- Designing and implementing fault injection testing, chaos engineering practices, and scenario-based simulations to validate platform robustness;
- Collaborating with product, infrastructure, architecture and development teams to re-design services with built-in redundancy, failover, and graceful degradation;
- Driving automation and observability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation;
- Contributing to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BCDR), ensuring IoT systems remain resilient and recoverable in the face of unexpected disruptions;
- Owning the resilience roadmap and continuously assessing emerging threats, technologies, and architectural shifts to guide evolution of stability practices;
- Evangelizing a culture of resilience through internal communication, workshops, and post-incident learning programs;
- Deliver high-quality engineering solutions while continuously strengthening the resilience, scalability, and cost efficiency of our IoT platform;
- Consistently meet or exceed delivery expectations by prioritizing the highest-leverage resilience initiatives that improve customer experience, business outcomes, and financial performance;
- Build trusted, transparent, and outcome-driven relationships by providing clear technical direction and trade-off recommendations to business and engineering stakeholders.
Who you are
- Educated to BSc degree level in Software Engineer or related discipline with Computer Science
- Strong scripting and automation experience (e.g., Python, Bash, Go, PowerShell), with a demonstrated ability to replace manual processes with reliable, scalable automation;
- Proven experience designing and operating high-availability, fault-tolerant systems, including the use of chaos engineering techniques and proactive failure-mitigation strategies;
- Experience applying Business Continuity and resilience standards (e.g., ISO 22301) in the context of real-world platform design and operational readiness;
- Hands-on experience designing or integrating monitoring, alerting, and automated testing frameworks to support early fault detection and system validation;
- Broad experience working with Linux-based platforms across on-premises and cloud environments, with an understanding of how infrastructure choices impact reliability, scalability, and recovery;
- Deep expertise in Site Reliability Engineering principles, including SLOs/SLIs, error budgets, observability, toil reduction, and automation, with the ability to apply them at platform and system scale to guide architectural decisions and long-term resilience strategy;
- Proven ability to balance long-term platform stability with delivery velocity by making clear, data-driven trade-offs;
- Strong understanding of security principles, practices, and standards, and the ability to incorporate them into resilient, real-world technical solutions;
- Deep command of telemetry, logging, and alerting ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, Splunk), with the ability to design signals that enable early fault detection and informed decision-making;
- Experience defining meaningful SLIs and building dashboards that drive architectural insight, prioritization, and corrective action;
- Proven experience leading blameless post-incident reviews, root cause analysis, and systemic improvements across multiple teams;
- Expertise in identifying and addressing system bottlenecks, latency issues, and throughput constraints in distributed environments;
- Proficiency in forecasting demand, planning capacity, and managing system growth in a cost-efficient and sustainable manner;
- Strong track record of partnering with software engineering, infrastructure, product, and business teams to embed resilience into the full development lifecycle;
- Fluency in English.
Not a perfect fit?
Worried that you don’t meet all the desired criteria exactly? At Vodafone we are passionate about empowering people and creating a workplace where everyone can thrive, whatever their personal or professional background. If you’re excited about this role but your experience doesn’t align exactly with every part of the job description, we encourage you to still apply as you may be the right candidate for this role or another opportunity.
What's in it for you
- Hybrid Work Model - Flexible hybrid work model with 8-10 in-office days per month, managed by team leaders;
- Vodafone Products and Services - Employees get a mobile phone, free communication plan, data card, and various discounts on services and products;
- Recognition - Recognition programs for innovative, creative, high-potential employees and exemplary behaviors;
- Health and Well-being - Well-being Program offers nutrition and psychological consultations, webinars, workshops, and discounts on various services and products;
- Learning - Access to Communities of Practice and a customizable digital training platform with high-quality content (namely Harvard Business Publishing, Skillsoft and Speexx);
- Local and International Mobility - Internal recruitment with local and international rotation opportunities across departments and roles.
Who we are
We are a leading international Telco, serving millions of customers. At Vodafone, we believe that connectivity is a force for good. If we use it for the things that really matter, it can improve people's lives and the world around us. Through our technology we empower people, connecting everyone regardless of who they are or where they live and we protect the planet, whilst helping our customers do the same.
Belonging at Vodafone isn't a concept; it's lived, breathed, and cultivated through everything we do. You'll be part of a global and diverse community, with many different minds, abilities, backgrounds and cultures. ;We're committed to increase diversity, ensure equal representation, and make Vodafone a place everyone feels safe, valued and included.
If you require any reasonable adjustments or have an accessibility request as part of your recruitment journey, for example, extended time or breaks in between online assessments, please refer to https://careers.vodafone.com/application-adjustments/ for guidance.
Together we can.
Company benefits
Working at Vodafone
Company employees:
Gender diversity (m:f):
Hiring in countries
Albania
Cyprus
Czechia
Egypt
France
Germany
Greece
Hungary
India
Ireland
Italy
Luxembourg
Mozambique
Portugal
Romania
South Africa
Spain
Tanzania
Türkiye
United Kingdom
Office Locations
Other jobs you might like
Site Reliability Engineer
1 Braham Street, London, United Kingdom
19 Dec 2025
Transparency9.2/10
RankingVBTS Resilience & Incident Senior Expert
İstanbul, Istanbul, Türkiye
12 Dec 2025
Transparency8.6/10
RankingSenior Site Reliability Engineer
Sofia, BG
18 Nov 2025
Transparency8.4/10
RankingSenior DevOps Engineer- DR Drills, BCP (EX 7-12Y), Devanahalli Bangalore
Bangalore, IN
12 Nov 2025
Transparency8.4/10
RankingSenior Resilience Lead
Reading, United Kingdom
3 Nov 2025
Transparency8.4/10
Ranking



