System 2 Reasoning (Objective-Driven AI)

Active Frontier

planningreasoningworld-modelsoptimization

System 2 Reasoning (Objective-Driven AI)

Borrowed from Kahneman's cognitive framework: System 1 is reactive, subconscious, fast (driving a familiar route); System 2 is deliberate, effortful, slow (planning a trip, solving a puzzle). Current LLMs operate mostly as System 1 — autoregressive next-token prediction is a reactive forward pass, not a search over possibilities.

Objective-driven AI is the architectural proposal for System 2: rather than generating output directly, the system

Perceives the world state
Uses a world model to imagine candidate action sequences
Evaluates predicted outcomes against a task objective and safety guardrails
Optimizes the action sequence to minimize cost
Executes the best sequence

The key shift is inference by optimization rather than forward propagation. A System 2 model performs gradient steps (or sampling, or search) over possible plans at inference time — forward propagation alone cannot do this.

Key Claims

Autoregressive LLM inference is architecturally System 1 — each token is a single forward pass; there's no mechanism for the model to reconsider a choice once made. Evidence: strong (definitional)
World models are the mechanism required for System 2 — to plan, you need to imagine consequences; imagination requires a predictive model of the environment. Evidence: moderate (framework-level claim)
Hierarchical world models enable long-horizon System 2 planning — H-WM demonstrates that symbolic-plus-visual predictors can plan multi-step TAMP problems without drift. Evidence: moderate (H-WM)
Structured prediction beats dense rollouts for planning — StructVLA shows sparse keyframe prediction outperforms every-pixel prediction as a planning substrate. Evidence: moderate (StructVLA)

The Planning Loop (formalized)

perceive(world) → s_0
loop:
  plan ← optimize(actions a_{1..T}) such that
    s_t = WorldModel(s_{t-1}, a_t)  # rollout
    cost = TaskObjective(s_T) + SafetyGuardrails(s_{1..T})
  execute(a_1)
  observe → s_0  # close loop

Compare with the LLM autoregressive loop:

loop:
  token_t ← LLM(token_{<t})
  emit(token_t)

System 2 requires the outer optimize(...) step. LLMs only have the inner forward pass.

Open Questions

How much System 2 behavior can be elicited from LLMs via chain-of-thought, scratchpads, or tool-use without actual optimization?
What's the right search algorithm for action-sequence optimization — gradient descent in latent space, sampling, MCTS?
How do you learn the task objective? Hand-specified, LLM-generated, or inverse RL?

Related Concepts

World Models — the simulator System 2 uses
Hierarchical Planning — multi-level System 2
Chain-of-Thought Reasoning — LLM-side approximation of System 2; autoregressive scratchpad
Agentic Reasoning — a different LLM-side approximation via tool loops

Backlinks

Yann LeCun

Changelog

2026-04-22 — Initial compilation. Framed as the architectural contrast to autoregressive LLM inference.

Related Concepts

Sources

v-jepa-2 hierarchical-world-model-tamp structvla-beyond-dense-futures

System 2 Reasoning (Objective-Driven AI)

Active Frontier

planningreasoningworld-modelsoptimization

System 2 Reasoning (Objective-Driven AI)

Objective-driven AI is the architectural proposal for System 2: rather than generating output directly, the system

Perceives the world state
Uses a world model to imagine candidate action sequences
Evaluates predicted outcomes against a task objective and safety guardrails
Optimizes the action sequence to minimize cost
Executes the best sequence

Key Claims

Autoregressive LLM inference is architecturally System 1 — each token is a single forward pass; there's no mechanism for the model to reconsider a choice once made. Evidence: strong (definitional)
World models are the mechanism required for System 2 — to plan, you need to imagine consequences; imagination requires a predictive model of the environment. Evidence: moderate (framework-level claim)
Hierarchical world models enable long-horizon System 2 planning — H-WM demonstrates that symbolic-plus-visual predictors can plan multi-step TAMP problems without drift. Evidence: moderate (H-WM)
Structured prediction beats dense rollouts for planning — StructVLA shows sparse keyframe prediction outperforms every-pixel prediction as a planning substrate. Evidence: moderate (StructVLA)

The Planning Loop (formalized)

perceive(world) → s_0
loop:
  plan ← optimize(actions a_{1..T}) such that
    s_t = WorldModel(s_{t-1}, a_t)  # rollout
    cost = TaskObjective(s_T) + SafetyGuardrails(s_{1..T})
  execute(a_1)
  observe → s_0  # close loop

Compare with the LLM autoregressive loop:

loop:
  token_t ← LLM(token_{<t})
  emit(token_t)

System 2 requires the outer optimize(...) step. LLMs only have the inner forward pass.

Open Questions

How much System 2 behavior can be elicited from LLMs via chain-of-thought, scratchpads, or tool-use without actual optimization?
What's the right search algorithm for action-sequence optimization — gradient descent in latent space, sampling, MCTS?
How do you learn the task objective? Hand-specified, LLM-generated, or inverse RL?

Related Concepts

World Models — the simulator System 2 uses
Hierarchical Planning — multi-level System 2
Chain-of-Thought Reasoning — LLM-side approximation of System 2; autoregressive scratchpad
Agentic Reasoning — a different LLM-side approximation via tool loops