## Safety Is Not a Cost Center The conventional wisdom: AI safety is a tax on innovation. Something you do because regulators demand it. Goodfire's thesis inverts this completely — interpretability is not a guardrail, it is a capability unlock. When you can see inside a model, you can do things that are impossible from the outside. Discover knowledge the model learned but never surfaced. Fix specific failure modes surgically instead of retraining from scratch. Build classifiers without labeled data. Audit models for enterprise compliance with evidence from the model's internals, not just behavioral testing. ## What Ember Actually Does Goodfire's core product is **Ember** — the first hosted mechanistic-interpretability API. It decomposes neural networks into interpretable "features" using sparse autoencoders: discrete, meaningful units of computation inside a model. The key difference from prompt engineering or fine-tuning: Ember lets you reach inside a model and modify specific features to change behavior. Identify the internal concept responsible for hallucination or PII leakage, and dial it up or down. Surgically. They've demonstrated halving hallucination rates through targeted internal intervention — no months of RLHF required. Ember is model-agnostic, working across architectures and modalities — text, vision, audio, genomics, robotics. They trained the first-ever sparse autoencoders on DeepSeek R1 (671B parameters) and open-sourced them. ## The Landmark Results Two results prove the thesis is more than theoretical: **Alzheimer's biomarkers.** Working with Prima Mente and Mayo Clinic, Goodfire identified a novel class of Alzheimer's biomarkers by reverse-engineering an epigenetic foundation model. This didn't come from asking the model questions — it came from examining what the model's neurons encoded. The first major scientific discovery obtained from interpreting a foundation model's internals. **Arc Institute.** Using Ember to extract novel biological concepts from Evo 2, their DNA foundation model. Patrick Hsu (Arc co-founder) credited Goodfire with "unlocking deeper insights" invisible to conventional analysis. ## The Numbers $207M total raised across three rounds in 18 months: - **Seed** (Aug 2024): $7M led by Lightspeed - **Series A** (Apr 2025): $50M led by Menlo Ventures — with **Anthropic's first-ever external investment** - **Series B** (Feb 2026): $150M led by B Capital at $1.25B valuation. Eric Schmidt, Salesforce Ventures, DFJ Growth participating. The Anthropic investment is the strongest possible signal. The company most associated with AI safety — whose co-founder Chris Olah essentially created the field of mechanistic interpretability — chose Goodfire as their first external bet. They view it as complementary infrastructure, not competition. ## The Team Co-founded by Eric Ho (CEO, ex-RippleMatch founder), Dan Balsam (CTO, NYU CS), and Tom McGrath (Chief Scientist, who founded DeepMind's interpretability team). ~39 employees with researchers from OpenAI, Google, and Palantir. Nick Cammarata, who co-started OpenAI's interpretability team with Chris Olah, serves as advisor. Structured as a public benefit corporation. Headquartered in San Francisco. ## The Bet As AI models become more powerful and autonomous — agents handling real money, real patients, real decisions — organizations will need to understand what is happening inside. Not for ethical reasons alone, but because opaque agents are an unacceptable business risk. Goodfire is building the only commercial platform for this. Every other interpretability team sits inside a frontier lab, serving internal goals. Goodfire is model-agnostic infrastructure available to anyone. That's why the market priced them at $1.25B in 18 months. Safety is not the opposite of capability. It is the prerequisite for capability at scale.