At Papercup we’re on a mission to make the world’s videos watchable in any language. We’ve invented a patented AI system that can generate synthetic voices across languages - each of which sound just like humans. Our translated and dubbed content has allowed the likes of BBC, Discovery, HSBC, and Canva to reach over 300 million people globally.
Having recently raised a $20 million Series A round, we’re backed by some of the industry’s heaviest hitters - venture funds like Octopus Ventures, world-renowned angel investors including Des Traynor (co-founder of Intercom) and John Collison (co-founder of Stripe), as well as global media groups like Sky and Guardian Media Group.
We are driven, curious and passionate - our company culture is imperative to us and we set a high bar for those who join the team.
About the role
As a Data Engineer at Papercup, you will lead the development of our data platform by influencing our choices of architecture and technology. Part of your responsibilities will include ensuring that data is reliable and easily accessible across the company. As one of the first dedicated data engineers in the company, you will be working with the teams to establish the data platform solutions to empower BI solutions, workflow optimization, and machine learning research.
Your main goal is to be one step ahead of data scientists and analysts, consult engineers on optimal data solutions, and work with product managers to identify relevant data points. Having a maintainable and flexible data systems will be one of the major pillars of success of the data platform team in a rapidly changing environment like Papercup.
About our Data:
At Papercup we build internal tooling to help to make world content accessible. To achieve our mission, we currently use a Human in the Loop system (HitL), to ensure that we are producing high quality output. Majority of our data is created by the HitL, which we use to make product decisions, optimise the workflow, and provide insights into our machine learning pipeline.
Our current data platform consists of following solutions: Airflow for orchestration machine learning data pipeline; airbyte for extracting and loading our data into our data lake (BigQuery); mixpanel for capturing event level data; metabase as our BI solution.
What you will do with us:
- Develop robust ETL/ELT processes and maintain data pipelines for our human in the loop systems, customer data and machine learning feature engineering
- Educating and embedding new data techniques into the business through role modelling, data governance practices, training and experiment design oversight
- Optimise data models with engineers to provide reliable, and performant data access to production and analytics data systems
- Identify bottlenecks in internal data processes and work to resolve them by automating processes, optimising data delivery and storage, re-designing infrastructure for greater scalability, etc
- Developing comprehensive knowledge of the bank’s data structures and metrics, create data models & data flows diagrams to enhance data understanding, and advocating change where needed for product development
- Advice on future design approach & principles being considerate of data privacy rules and data regulations
What we need from you:
- Experience using data pipeline building tools, ETL/ELT, reporting and analytic tools, preferably working working within a Kubernetes environment
- Solid understanding of SQL
- Familiarity with NoSQL solutions
- Experience working with data lake solutions such as BigQuery/Databricks/Snowflake
- Ability to get to the bottom of ambiguous requirements and identity missing pieces
- Ability to identify and adopt new technologies quickly, we are always looking to make the most out of cutting edge tools to increase productivity
Nice if you have:
- Experience working with workflow orchestration tools such as AirFlow
- Experience setting up data pipelines with Airbyte and dbt
- Good familiarity with major cloud providers such as AWS/GCP/Azure
- Ability to work with Product and ML teams to understand requirements and translate them into well-designed data models and pipelines
- Experience in a fast-paced environment (e.g. high growth start-up or consultancy - but we are open to wider relevant experience)
- We really care about our people - culture matters deeply to us and we are committed to building a company that people are proud to work for
- We're growing fast. That means you can help grow the business from the ground up and maintain a lot of ownership
Apart from all this good stuff, what else do we offer?
- Competitive salary of £60,000 - £75,000 dependant on experience
- Unlimited vacation policy
- Hybrid working: flex between WFH and time in our Old Street office (around 2 days per week)
- Private medical cover or monthly wellness bonus
- Learning budget and 'reading week' to carve out time to up skill in your domain
- The usual food and fun perks: snacks, beer fridge regular team socials, annual offsites
Please note we’re not looking for someone who ticks all the boxes, if you have some of the skills listed above and are willing to learn, you’re the person for Papercup.
Once we receive your application, we will try to review it and respond within 3 working days. We will arrange a 30 minute video call with if our team thinks your skills and experience are a good fit for Papercup.