< Back to search
Microsoft UK • London, United Kingdom

Site Reliability Engineer - Met Office

Employment type:  Full time

Job Description

Microsoft is on a mission to empower every person and every organization on the planet to achieve more, and the Azure cloud is at the forefront of this mission. Are you interested in working for one of the most exciting products in Microsoft Azure, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Azure Customer Experience (CXP) team is searching for a customer obsessed Site Reliability Engineer to work on a HPC environment, that can drive reliability engineering excellence and embody our culture of inclusiveness, growth-mindset, and unwavering dedication to diversity. We are a fast-paced agile team in a start-up like culture where you are empowered to help shape the future. Our “no dead-ends”, “whatever it takes”, “biased for action”, “make it better than ever” philosophy ensures that every customer can realize their full potential through the Microsoft Cloud. We are fast growing team, but we make sure we are committed to remain agile. Customer first, nurturing trust, high responsiveness, automation, SLO/SLI/SLA, blameless post-mortem, observability, monitoring, alerting, and toil reduction form the foundations of our code and we work with teams across Microsoft and external customers to ensure success. We work on exciting engineering challenges in a fun and supporting environment, with access to cutting edge technology surrounded by world-class engineers.

Responsibilities

  • Collaborating closely with the existing SRE teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO’s and averting incidents altogether when possible.
  • Collaborating with the customers to understand their pain points around Supportability and SLO attainment and formulate strategies for addressing recurring issues in a sustainable way.
  • Communicate on a deeply technical level and be the single point of contact for interfacing with a large enterprise customer, for handling service escalations and driving the issues to resolution.
  • Ability to design and implement any changes to service telemetry for the automation to consume if it is not already available.
  • Enhancing customer facing experience by proactive alerting based on utilisation, trends, resource health, etc.
  • Analyse data and provide operational insights into customer experience to Design and Product teams, so that we can design features with Supportability in mind.

Qualifications

  • In-depth technical experience in software engineering, network engineering, or systems administration
  • Operational experience in improving Service Reliability, Availability and Performance
  • Ability to deal with the ambiguity associated with working in a fast-paced environment
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of curiosity
  • Expertise in analysing, troubleshooting, and automating root cause analysis and mitigation of incidents impacting large-scale distributed systems.
  • Ability to travel to customer site on a regular basis in South West UK

PREFERRED QUALIFICATIONS

  • Prior HPC knowledge
  • Influencing the product architecture and roadmap to make sure the customer-experienced supportability is always a key consideration when evolving the product

Other Requirements Microsoft Cloud Background Check:

The ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. UK Baseline Personnel Security Standards; UK Security Clearance

#cxpaces #azcxp

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Company benefits

Wellbeing allowance
Health insurance
Dental coverage
Gym membership
Mental health platform access
Buy or sell annual leave
Shared parental leave
Charity donation scheme
Employee assistance programme
Employee discounts
Volunteer days – 3 days a year
Fertility treatment leave
Open to compressed hours
Open to job sharing
Fertility benefits
Enhanced sick pay
Enhanced sick days
Compassionate leave
Travel insurance
20 days annual leave + bank holidays
Enhanced maternity leave – 26 weeks paid
Enhanced paternity leave – 6 weeks paid
Adoption leave – 24 weeks paid
Childcare credits
Carer’s leave – 4 weeks paid
Cycle to work scheme
Faith rooms
Annual bonus
Annual pay rises
Company car
Hackathons
Open to part-time employees
Pregnancy loss leave
Life insurance
Equity packages
Financial coaching
Relocation packages
Sabbaticals
Enhanced pension match/contribution
Family health insurance
LinkedIn learning license
In house training
Personal development days

Working at Microsoft UK

Company employees:

Globally: 228,000

Gender diversity (m:f):

67:33

Hiring in countries

Germany

Netherlands

Spain

United Kingdom

Office Locations

Awards & Accreditations

Family Friendly

Family Friendly

Flexa awards 2025
Career Progression

Career Progression

Flexa awards 2025
Most flexible companies

Most flexible companies

Flexa100 2024

Other jobs you might like