PAPER2026-04-23·Multi-institution·arXiv 2604.15726

LLM Reasoning Is Latent, Not the Chain of Thought

arXiv preprint authors

COMPILED NOTES

Argues LLM reasoning should be studied as latent-state trajectory formation, not faithful surface CoT — implications for interpretability, alignment, and training-objective design

LLM Reasoning Is Latent, Not the Chain of Thought

Abstract

A new arXiv preprint (2604.15726, April 2026) argues that LLM reasoning should be studied as latent-state trajectory formation rather than as faithful surface chain-of-thought. The paper challenges the prevailing assumption that visible CoT tokens reflect the actual computation the model is performing — instead, CoT is more like a post-hoc rationalization, while the real reasoning happens in latent representations that are not always faithfully represented in surface tokens.

Key Contributions

Reframes CoT as post-hoc surface rationalization rather than the underlying reasoning process.
Latent-state trajectory as the analytical primitive — reasoning is a path through hidden representations, not a token sequence.
Implications for interpretability: CoT auditing is not a reliable safety/alignment lever if it doesn't reflect actual computation.
Implications for training: pure CoT-supervision objectives may be optimizing the wrong thing.

Methodology

The authors use mechanistic interpretability tools (probes, sparse autoencoders, intervention experiments) to identify discrepancies between visible CoT content and latent computations. They demonstrate cases where the model's behavior matches latent trajectory predictions even when CoT content is altered or absent.

Results

Multiple model families show evidence of CoT-latent decoupling.
Some reasoning patterns appear in latent trajectories without corresponding surface CoT representation.
Implications connect to "CoT-faithfulness" literature (Anthropic, Apollo Research).

Limitations

Mechanistic interpretability tools have known limitations (feature splitting, polysemanticity).
Paper focuses on specific model families and reasoning types — generalization claims need broader validation.
Practical implications for alignment depend on whether latent-state interventions become tractable.

Full Content

This paper sits at the intersection of three active research programs: (1) chain-of-thought efficacy and faithfulness, (2) mechanistic interpretability, and (3) LLM alignment via reasoning supervision. If CoT is post-hoc rationalization, then alignment strategies that rely on CoT auditing (e.g., monitoring an agent's chain-of-thought for malign intent) are weaker than they appear.

This complements the parallel "CoT length scaling" literature (Wei et al. 2022, "When More Is Less" 2502.07266, etc.), which addresses the question: how long should CoT be? The latent-reasoning paper goes further: is CoT the right primitive at all?

Practical implications:

For alignment teams: CoT auditing is a partial signal at best.
For research labs: training objectives may need to incorporate latent-state shaping, not just CoT-token supervision.
For interpretability: latent trajectory analysis becomes a higher-priority research direction.
For users: trust in visible LLM reasoning steps should be calibrated against this finding — visible CoT is not always faithful to actual computation.

Source: arXiv 2604.15726 — LLM Reasoning Is Latent, Not the Chain of Thought, April 2026

RELATED · IN THE BASE