Back to Careers
Kernel Optimization Research Scientist
San Francisco, CA (In-Person)
General Diffusion is a foundational AI research lab establishing the scientific discipline of Compute Intelligence. We build frontier models that learn the physics of heterogeneous hardware, decoupling intelligence from infrastructure.
<br/>About the role
As a Kernel Optimization Research Scientist, you will drive the intelligence behind PO1 (Performance Optimizer Agent). Your goal is to automate the "black art" of kernel fusion. You will develop algorithms that analyze compute graphs and mathematically prove the optimal tiling and fusion strategies for any given hardware topology.
<br/>What you might work on
- Developing cost models for PO1 that predict kernel latency within 5% of ground truth.
- Researching novel graph partitioning algorithms to minimize global memory round-trips.
- Automating the discovery of "fusable" subgraphs in novel architectures (Mamba, RWKV, SSMs).
- Publishing research on automated kernel generation and hardware-software co-design.
- Working closely with the compiler team to implement your theoretical findings in the CG1 backend.
What we’re looking for
- PhD or equivalent research experience in High-Performance Computing (HPC), Systems, or ML Systems.
- Strong publication record in venues like OSDI, SOSP, MLSys, or SC.
- Deep understanding of roofline analysis and hardware bottlenecks.
- Experience with analytical modeling of computer architecture.
- Ability to bridge the gap between abstract graph theory and bare-metal silicon.
Our culture
- Compute Intelligence. We are establishing a new scientific discipline.
- Silicon Neutrality. We build foundational models that run on any chip.
- Deep Work. We value long periods of uninterrupted focus over endless meetings.
