Element • United States

Site Reliability Engineer (North America)

Employment type:  Full time

Remote-first

Fully flexible hours

Dog friendly

Job Description

Who We Are

Element is the startup that employs the core team behind matrix.org — the leading project for secure, open decentralised communication.

Matrix’s mission is to make messaging as open as email — allowing everyone to choose where their data is hosted, enjoy private conversations thanks to advanced encryption, and ultimately be in control of their own communication.

Matrix powers our flagship messaging apps for the web, iOS & Android, along with Element Matrix Services, our SaaS platform for personal & professional use.

We build things for everyone, and we know we can’t succeed without a diverse team. Our hiring process is designed to give candidates the best chance to show us what you can do. If we ever fall down on this, please let us know.

About Your Team

We are a small team today of five engineers working hard at transforming how operations and infrastructure is done within the organisation. We come from various backgrounds and are today a remote-first team with all of us working from different countries across Europe and the UK.

As part of our day-to-day operations, we use or touch on (in no particular order) AWS, UpCloud, Postgres, Grafana, Prometheus, Loki, Elasticsearch, PagerDuty, Python, Red Hat OpenShift, GitLab, GitHub, Ansible, AWX, Terraform, Keycloak, Linux, Kubernetes, Golang, HAProxy, Nginx to name a few.

The Team Today

  • We manage both internal and client infrastructure across private and public clouds, in private data centers and on kubernetes clusters. Translation - we ssh into boxes, apply ansible, use terraform, manage kubernetes clusters, manage configurations and release roll-outs.
  • We react to and resolve various issues within the infrastructure. Translation - we respond to alerts and pages, we go on-call, look at grafana dashboards, isolate/debug production issues, roll-out mitigation where we can etc. We are predominantly responsible for the availability of most services deployed in the organisation.
  • We are responsible for internal IT. Translation - we help on-board new employees, manage things like mail, calendaring access, sso etc.
  • We help our clients understand their needs, identify bottlenecks and manage their on-premise Matrix services. Translation - we do a bit of consulting work with our Professional Services team and also manage services on-premise on behalf of our customers.
  • We are working with intent towards our tomorrow. Translation - we dedicate time where we can to automate, modernise or fix our current assets, improve our processes/platforms and build best practices for our engineering teams.

The Team Tomorrow

  • We are cloud native and container first. Translation - our focus is on developing and delivering artefacts and automation predominantly targeting cloud environments. Particularly, we are focused on container native environments running on both managed and self-managed kubernetes clusters at scale.
  • We are focused on developer enablement. Translation - we focus on enhancing developer experiences by improving CI/CD pipelines, sharing cloud native development expertise and codifying our expertise in this area. We provided reliable infrastructure and platform for developers to build and deploy services to production. Developers are responsible for the day-to-day automation of their services, assisted by the tools and processes provided by us.
  • We are focused on Site Reliability. Translation - we codify the operational tasks, automate the recovery from incidents and we manage cattle not pets. We provide infrastructure, platform and tools for services to run at scale. And we enable automated operations across most if not all our mission critical services.

Requirements

About you

We are presently in the process of working towards our tomorrow. And we want to bring more team members along for this journey. We want to work with collaborative and kind people who do not mind experimenting with the unknown.

What we care about

  • You are kind, empathetic and willing to share your knowledge and experience.
  • You are willing to ask for help and to provide it when you can.
  • You are keen on learning new things and figuring out how to improve the status quo.
  • We try (operative word here) to focus on getting things done right, but this does not mean we do it right the first time as being pragmatic is important to us.

What are the basics you need

  • We do not need you to have any experience in decentralised communication, nor do we expect you to be experienced and knowledgeable in everything. However, there are few things that being familiar with will let you get your job done.
  • Linux Servers - You can ssh into a machine, update packages, get at logs, figure out why it is misbehaving.
  • Containers - You have built your own containers before, used it in anger, and understand the basics of how they work.
  • Infrastructure automation - You have worked with at least one of Terraform or Ansible. We are also happy if you have worked with similar tools, like Puppet, Chef or Saltstack etc.
  • Public/Private Cloud Providers - You have used at least one of AWS / Azure / GCP. Hopefully, you have used terraform to automate infrastructure on them.
  • Programming languages - You have written some meaningful code that did some of your automation for you. Preferably in Python or Go. You are able to look at an unknown code base and understand it enough to try debugging it in production.

About The Process

  • Opportunity fit. At this stage you will be talking to a member of our people team. We will talk a bit about the company, about you, about the role etc. You will get a chance to ask questions and understand better if the role fits what you are looking for. Obviously, we will have a few questions for you as well, but no whiteboards or algorithms involved.
  • Offline coding exercise. We will send you a coding exercise, nothing overly complex. Something that might take someone 1-2 hours to complete. We are looking for your approach, an insight into how you solve a problem and basic smell tests on your coding practices.
  • Interview with the team. In this stage, you will be talking to a couple of members from the team. We will talk about your coding exercise submission and talk about improvements etc. You will get an opportunity to talk to the team about the day-to-day, the organisation, the team, their experiences etc. We will definitely ask some technical questions here. We are looking for insights into how you communicate your ideas and technical solutions. We expect the conversation to be two-way.
  • Architecture. Here, you will be talking to the team's Engineering manager and our VP of Engineering or Founder/CTO. Expect conversations around architecture, building at scale etc. You will get the opportunity to understand more about Element’s history, it’s future direction etc.
  • If you have any questions before making an application reach out to Adam (@adamt:element.io) via https://app.element.io

    Benefits

    People tend to stay with the company for a long time, we take this as a sign that we have a cohesive, supportive culture, that we have engaging challenging work and that people can develop their skills and careers here for the long term. We also have a family friendly environment, many of the team have small children and we look to accommodate that as best we can. Since our technology is relevant to anything that requires real-time comms, the role provides exposure to a wide range of domains from web and app dev through to VR, VoIP and IoT.

    Our package generally contains:

    • Private Health Insurance
    • Pension
    • Annual Bonus
    • Share Options
    • Home Office Allowance
    • Coworking Space Allowance
    • Annual leave (40 days including local bank holidays)
    • Company Socials (virtual or in person)
    • Annual Global Offsite
    • Plumm Health (mental health platform)

    You can find a more detailed explanation here and you might be eligible for other benefits depending on your location.

    Element does not discriminate on the basis of race, sex, colour, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

    Company benefits

    Open to part-time employees
    Open to job sharing
    Sabbaticals
    Work from anywhere scheme
    Enhanced sick pay
    Pregnancy loss leave

    The FlexScore® is the result of a rigorous 2-step verification of a company’s flexibility

    First we assess the flexibility options Element provides and then we anonymously survey a statistically significant proportion of their employees to make sure Element is as flexible as they say they are. Our assessment is based on the six key elements of flexibility: location, hours, autonomy, benefits, role modelling and work-life balance.

    We ask the hard questions so you don’t have to.

    Working at Element

    Company employees

    115

    Gender diversity (male:female:non-binary)

    83:16:1

    Office locations

    London, UK and Rennes, France

    Funding levels

    $48,100,000
    What employees are saying

    "Element is 100% one of the most flexible companies I've ever worked for, and it makes a big difference. I've had recruiters offer me salaries that are £60k higher, and it's still not enough to make me want to leave this lovely place. Founders make a BIG DEAL about flexible working. In fact, they have apologised company wide for sending emails a bit late or a bit early. And they've made it VERY CLEAR that this is because they take their children to school or take time out, and the rest of us aren't supposed to feel pressured by that. It's just a lovely environment all round and I feel very lucky!"

    Anonymous Element Employee

    Awards & Achievements
    Most flexible Saas & Dev Software companies

    Most flexible Saas & Dev Software companies

    Industry awards 2022