At Papercup we’re on a mission to make the world’s videos watchable in any language. We’ve invented a patented AI system that generates humanlike synthetic voices across languages, allowing people to watch video content in the language of their choice. Our translated and dubbed content has allowed the likes of Insider, Discovery, Sky News, and Canva to reach over 300 million people globally in just the last year.Having just completed a $20 million Series A round, we're on the hunt for top people to join our ambitious mission.
We’re backed by some of the industry’s heaviest hitters - venture funds like Octopus Ventures, world-renowned angel investors including Des Traynor (co-founder of Intercom) and John Collison (co-founder of Stripe), as well as global media groups like Sky and Guardian Media Group.
We are driven, curious and passionate - our company culture is imperative to us and we set a high bar for those who join theteam. We're also fun to be around (at least that's what people tell us).
About the role:
At Papercup, you will be part of a great team pushing the boundaries of neural text-to-speech and speech-to-speech translation systems. Our team works closely with leading speech processing academics as advisors - Mark Gales and Simon King and regularly publishes in top speech conferences. You will apply modern machine learning techniques to model the way people speak (prosody), where they put intonation, how they create emotion, etc. The exact direction of the project will depend on the interests of the student, but we see two main areas of focus:
- Applying self-supervised learning and foundation models to prosody modelling
- Our aim is to leverage self supervised learning and foundation models to aid our prosody modelling
- We have a very large human enhanced synthetic training set that we can use to train very large prosody model
- Audio production using machine learning
- To create a realistic sounding voice the synthetic voice must sound like it is in the correct environment, similar to creating the correct lighting of an object in image synthesis
- Here we want to apply machine learning automatically solve this audio production task
- And much more. Please get in touch for more details.
- u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
- wav2vec: Unsupervised Pre-training for Speech Recognition
- SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
- Using VAEs and Normalizing Flows for One-Shot Text-to-Speech Synthesis of Expressive Speech
- Using generative modelling to produce varied intonation for speech synthesis
- This is an internship for Masters Student in Machine Learning
Experience developing machine learning models using PyTorch or TensorFlow
- Theoretical understanding of deep learning
- Desire to lead your own research
Nice to haves:
- Experience with generative modelling
- Experience working with ASR and/or TTS systems
- Good knowledge of audio and signal processing fundamentals
- Familiarity with AWS, GCP, Kubernetes, Azure
The FlexScore® is the result of a rigorous 2-step verification of a company’s flexibility
First we assess the flexibility options Papercup provides and then we anonymously survey a statistically significant proportion of their employees to make sure Papercup is as flexible as they say they are. Our assessment is based on the six key elements of flexibility: location, hours, autonomy, benefits, role modelling and work-life balance.
We ask the hard questions so you don’t have to.
Working at Papercup
Gender diversity (male:female)
London - Shoreditch
$20m Series A
What employees are saying
"I love how management leads by example in terms of unlimited holiday, taking time off, and really disconnecting, which helps to send the message to everyone in the company that this is how we expect time off to look."
Anonymous Papercup Employee