Back to Careers
Distributed Systems Architect
San Francisco, CA (In-Person)
General Diffusion’s mission is to decouple intelligence from silicon. We believe the path to AGI requires a universal translation layer that makes compute fungible across any architecture—from H100s to TPUs to neuromorphic chips.
About the role
As a Distributed Systems Architect, you will lead the design of RS1 (Resource Scheduler Agent), the "brain" of our OS that orchestrates workloads across fragmented clusters. You will solve NP-hard scheduling problems in real-time, managing state across thousands of heterogeneous nodes with varying latency and bandwidth constraints.
What you might work on
- Designing the RS1 global scheduler to optimize for throughput/dollar across spot instances and reserved clusters.
- Building a fault-tolerant distributed state store that can survive node failures without stalling training runs.
- Implementing "live migration" for active inference contexts between different hardware types (e.g., H100 -> TPU v5).
- Optimizing the interconnect protocol to minimize serialization overhead in a multi-cloud environment.
- Creating simulation environments to stress-test the scheduler against network partitions and stragglers.
What we’re looking for
- Experience building large-scale distributed systems (Kubernetes internals, Paxos/Raft, distributed databases).
- Proficiency in Rust or Go for high-performance systems programming.
- Understanding of RDMA, InfiniBand, and modern datacenter networking.
- Experience with cluster schedulers (Slurm, Ray, Borg) is a strong plus.
- Ability to reason about consistency models and distributed consensus.
Our culture
- Silicon Neutrality. We build for the world where compute is a commodity, not a monopoly.
- Radical Efficiency. We believe software bloat is an existential risk to AGI.
- Deep Work. We value long periods of uninterrupted focus over endless meetings.
