Masters Program

Employment type: Full time

3–4 days/week at home

Fully flexible hours

Dog friendly

Hey there, we’re really sorry but this job is no longer available. Please take a look at our other roles, and check back again soon as we’re adding new roles all the time!

Job Description

Your mission

At Papercup we’re on a mission to make the world’s videos watchable in any language. We’ve invented a patented AI system that generates humanlike synthetic voices across languages, allowing people to watch video content in the language of their choice. Our translated and dubbed content has allowed the likes of Insider, Discovery, Sky News, and Canva to reach over 300 million people globally in just the last year.Having just completed a $20 million Series A round, we're on the hunt for top people to join our ambitious mission.

We’re backed by some of the industry’s heaviest hitters - venture funds like Octopus Ventures, world-renowned angel investors including Des Traynor (co-founder of Intercom) and John Collison (co-founder of Stripe), as well as global media groups like Sky and Guardian Media Group.

We are driven, curious and passionate - our company culture is imperative to us and we set a high bar for those who join theteam. We're also fun to be around (at least that's what people tell us).

Your profile

About the role:

At Papercup, you will be part of a great team pushing the boundaries of neural text-to-speech and speech-to-speech translation systems. Our team works closely with leading speech processing academics as advisors - Mark Gales and Simon King and regularly publishes in top speech conferences. You will apply modern machine learning techniques to model the way people speak (prosody), where they put intonation, how they create emotion, etc. The exact direction of the project will depend on the interests of the student, but we see two main areas of focus:

Applying self-supervised learning and foundation models to prosody modelling
- Our aim is to leverage self supervised learning and foundation models to aid our prosody modelling
- We have a very large human enhanced synthetic training set that we can use to train very large prosody model
Audio production using machine learning
- To create a realistic sounding voice the synthetic voice must sound like it is in the correct environment, similar to creating the correct lighting of an object in image synthesis
- Here we want to apply machine learning automatically solve this audio production task
- And much more. Please get in touch for more details.

Related papers

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
wav2vec: Unsupervised Pre-training for Speech Recognition
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Using VAEs and Normalizing Flows for One-Shot Text-to-Speech Synthesis of Expressive Speech
Using generative modelling to produce varied intonation for speech synthesis

Must haves:

This is an internship for Masters Student in Machine Learning
Experience developing machine learning models using PyTorch or TensorFlow
Theoretical understanding of deep learning
Desire to lead your own research

Nice to haves:

Experience with generative modelling
Experience working with ASR and/or TTS systems
Good knowledge of audio and signal processing fundamentals
Familiarity with AWS, GCP, Kubernetes, Azure

Company benefits

Enhanced paternity leave – 4 months full salary

Unlimited annual leave

Enhanced sick pay

Teambuilding days

Teambuilding holidays

Dog friendly office

Enhanced sick days

Enhanced maternity leave – 4 months full salary

Equity packages

Hackathons

Skilled worker visas

Work from anywhere scheme – 30 days per year

FlexScore®

The FlexScore® is the result of a rigorous 2-step verification of a company’s flexibility

First we assess the flexibility options Papercup provides and then we anonymously survey a statistically significant proportion of their employees to make sure Papercup is as flexible as they say they are. Our assessment is based on the six key elements of flexibility: location, hours, autonomy, benefits, role modelling and work-life balance.

We ask the hard questions so you don’t have to.