Director of Platform Engineering -Somnia

Employment type: Full time

Job Description

Somnia’s portfolio of technology covers both the core high-performance Somnia EVM blockchain, along with key protocols and features ranging from on-chain decentralised LLM execution to prediction markets and fully on-chain order book DEXes.

When building and operating decentralised infrastructure and protocols there is a natural challenge to maintain reliability, security and performance of the chain and protocols while also keeping engineering velocity with a lean team and supporting downstream application development.

The Role

As Director of Platform Engineering, you will own the ongoing operations of the chain and core protocols and the organisation that owns them. You’ll report to the CTO and partner closely with product engineering leadership to keep rapid feature development and system reliability aligned. We run a multi-cloud footprint (GCP, Runpod) supporting an ultra-high TPS L1 blockchain, RPC node fleets, a DEX, consumer products, and AI-driven oracles. This includes many 3rd party operators who run parts of the chain and protocols and other key services. The infrastructure is live and operational, but our complexity is outpacing the processes and tooling around it, reliability practices are still informal, and the team is ready for more structured leadership.

Your first priority is formalising reliability. We need SLOs defined, incident response tightened, and a deliberate shift toward proactive engineering. From that foundation, you'll build toward a self-serve internal developer platform that reduces infra as a bottleneck for product teams. You'll also carry the operational side of running a ~30-person platform org: headcount, vendor relationships, and cloud spend. This is a senior leadership role with real scope. You'll have a functioning foundation to build on, but plenty of room to put your stamp on how it evolves.

Your Opportunity

Lead and scale the platform org. You'll directly lead Engineering Managers, Tech Leads, and Staff/Principal engineers across multiple time zones (~30 people today, growing). Own hiring, performance calibration, and the technical quality bar.
Stand up reliability as a discipline. Define SLOs across all product lines, build an SRE function, and mature our incident response into a team that engineers for uptime. This is your first mandate.
Build the internal platform. Design and ship tooling: CI/CD, environments, observability, so product teams can deploy confidently with less hands-on infra involvement. You'll set the technical strategy for our cloud-native environments and build consensus with product engineering leadership to get it adopted.
Be the CTO's operational partner. Own cloud spend (seven figures) and headcount planning. Be the exec team's trusted voice on infrastructure risk, engineering capacity, and technical trade-offs.
Drive a High-Performance Engineering Organisation. Partner closely with the people team to own and continuously evolve our engineering calibration processes, ensure we maintain a consistently high talent bar across all teams.

We're Looking For

A proven engineering leader at scale. 8+ years leading engineering teams, with at least 3 years at the senior manager / director / VP level overseeing 25+ engineers across multiple teams and time zones. You've built and developed managers, not just managed ICs.
Deep SRE and reliability experience. You've built or matured reliability practices, defined SLO frameworks, built on-call rotations and incident-response programmes, and raised the bar from informal to systematic.
Distributed systems fluency. You've designed and operated high-performance, low-latency distributed systems at a meaningful scale. You have strong views on IaC and orchestration, even if you're not writing Terraform daily.
Comfort with high-scale, high-stakes infrastructure. You've run cloud platforms through rapid traffic growth, massive concurrency, or tight latency constraints, in any demanding domain (ad tech, gaming, fintech, large-scale SaaS, or similar).
Technical credibility without being a bottleneck. You can go deep in architecture reviews, system design, and production incidents. But you build the team and processes so you don't have to be in every room.
Operational ownership. You've managed engineering budgets, headcount planning, and vendor relationships. You treat operational rigour as part of the job, not overhead.
Running efficient, high-performance teams - Experience with running fair, rigorous, and data-informed performance reviews, promotion cycles, and compensation planning.

Nice to Have

Web3 and blockchain domain experience. Familiarity with L1 infrastructure, validator operations, EVM, DeFi ecosystems, or the operational quirks of crypto workloads.
AI/ML infrastructure experience. Exposure to deploying and scaling model inference workloads or oracle networks.
True multi-cloud experience. Operating across different cloud providers simultaneously (not just multi-region within one).
AI Operations: embrace the new ways of operating and working given the new technologies and processes that are emerging in the industry.

While we think the above experience is important, we’re very keen to hear from people who believe they have valuable experience to bring to this role. If you identify with the team and mission, but not all of our requirements, then please still apply.

Improbable Candidate Privacy Policy

Apply now