General Diffusion
Back to Careers

Kernel Optimization Research Scientist

San Francisco, CA (In-Person)

General Diffusion’s mission is to decouple intelligence from silicon. We believe the path to AGI requires a universal translation layer that makes compute fungible across any architecture—from H100s to TPUs to neuromorphic chips.

About the role

As a Kernel Optimization Research Scientist, you will drive the intelligence behind PO1 (Performance Optimizer Agent). Your goal is to automate the "black art" of kernel fusion. You will develop algorithms that analyze compute graphs and mathematically prove the optimal tiling and fusion strategies for any given hardware topology.

What you might work on

  • Developing cost models for PO1 that predict kernel latency within 5% of ground truth.
  • Researching novel graph partitioning algorithms to minimize global memory round-trips.
  • Automating the discovery of "fusable" subgraphs in novel architectures (Mamba, RWKV, SSMs).
  • Publishing research on automated kernel generation and hardware-software co-design.
  • Working closely with the compiler team to implement your theoretical findings in the CG1 backend.

What we’re looking for

  • PhD or equivalent research experience in High-Performance Computing (HPC), Systems, or ML Systems.
  • Strong publication record in venues like OSDI, SOSP, MLSys, or SC.
  • Deep understanding of roofline analysis and hardware bottlenecks.
  • Experience with analytical modeling of computer architecture.
  • Ability to bridge the gap between abstract graph theory and bare-metal silicon.

Our culture

  • Silicon Neutrality. We build for the world where compute is a commodity, not a monopoly.
  • Radical Efficiency. We believe software bloat is an existential risk to AGI.
  • Deep Work. We value long periods of uninterrupted focus over endless meetings.

Apply for this role

PDF, DOCX, or TXT (Max 5MB)