System 2 Reasoning (Objective-Driven AI)
Active FrontierSystem 2 Reasoning (Objective-Driven AI)
Borrowed from Kahneman's cognitive framework: System 1 is reactive, subconscious, fast (driving a familiar route); System 2 is deliberate, effortful, slow (planning a trip, solving a puzzle). Current LLMs operate mostly as System 1 — autoregressive next-token prediction is a reactive forward pass, not a search over possibilities.
Objective-driven AI is the architectural proposal for System 2: rather than generating output directly, the system
- Perceives the world state
- Uses a world model to imagine candidate action sequences
- Evaluates predicted outcomes against a task objective and safety guardrails
- Optimizes the action sequence to minimize cost
- Executes the best sequence
The key shift is inference by optimization rather than forward propagation. A System 2 model performs gradient steps (or sampling, or search) over possible plans at inference time — forward propagation alone cannot do this.
Key Claims
-
Autoregressive LLM inference is architecturally System 1 — each token is a single forward pass; there's no mechanism for the model to reconsider a choice once made. Evidence: strong (definitional)
-
World models are the mechanism required for System 2 — to plan, you need to imagine consequences; imagination requires a predictive model of the environment. Evidence: moderate (framework-level claim)
-
Hierarchical world models enable long-horizon System 2 planning — H-WM demonstrates that symbolic-plus-visual predictors can plan multi-step TAMP problems without drift. Evidence: moderate (H-WM)
-
Structured prediction beats dense rollouts for planning — StructVLA shows sparse keyframe prediction outperforms every-pixel prediction as a planning substrate. Evidence: moderate (StructVLA)
The Planning Loop (formalized)
perceive(world) → s_0
loop:
plan ← optimize(actions a_{1..T}) such that
s_t = WorldModel(s_{t-1}, a_t) # rollout
cost = TaskObjective(s_T) + SafetyGuardrails(s_{1..T})
execute(a_1)
observe → s_0 # close loop
Compare with the LLM autoregressive loop:
loop:
token_t ← LLM(token_{<t})
emit(token_t)
System 2 requires the outer optimize(...) step. LLMs only have the inner forward pass.
Open Questions
- How much System 2 behavior can be elicited from LLMs via chain-of-thought, scratchpads, or tool-use without actual optimization?
- What's the right search algorithm for action-sequence optimization — gradient descent in latent space, sampling, MCTS?
- How do you learn the task objective? Hand-specified, LLM-generated, or inverse RL?
Related Concepts
- World Models — the simulator System 2 uses
- Hierarchical Planning — multi-level System 2
- Chain-of-Thought Reasoning — LLM-side approximation of System 2; autoregressive scratchpad
- Agentic Reasoning — a different LLM-side approximation via tool loops
Backlinks
Changelog
- 2026-04-22 — Initial compilation. Framed as the architectural contrast to autoregressive LLM inference.