PAPER2026-03-25·Research collaboration·arXiv 2603.19312

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes et al.

COMPILED NOTES

First JEPA training stably end-to-end from raw pixels using only two loss terms — removes EMA/distillation tricks earlier JEPAs required

LeWorldModel (LeWM): Stable End-to-End JEPA from Pixels

Key Claims

First stable end-to-end JEPA from raw pixels — previous JEPAs (I-JEPA, V-JEPA) relied on tricks (EMA teacher, distillation, moving averages) to prevent representational collapse
Only two loss terms — major simplification over the multi-objective stacks used in Dino-style SSL
Open source — code at lucas-maes/le-wm

Why This Matters

The representational collapse problem is the central technical challenge for JEPAs: without careful engineering, the encoder learns to output a constant (zero-information) representation that trivially satisfies the predictive loss. Every JEPA variant to date has been a different answer to "how do we prevent collapse?" LeWM is a compelling answer because it's the simplest — if this scales, it removes one of the biggest complaints about JEPA as a recipe.

Key research question going forward: does LeWM's simplicity hold at V-JEPA 2 scale, or does the simplification break down at billion-parameter / million-hour-video regimes?

Notes

Recent paper (March 2026). Deeper read deferred — frontier-tracker candidate.

Source: LeWorldModel by Lucas Maes et al.

RELATED · IN THE BASE

PAPER2026-03-25·Research collaboration·arXiv 2603.19312

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes et al.

COMPILED NOTES

First JEPA training stably end-to-end from raw pixels using only two loss terms — removes EMA/distillation tricks earlier JEPAs required

LeWorldModel (LeWM): Stable End-to-End JEPA from Pixels

Key Claims

First stable end-to-end JEPA from raw pixels — previous JEPAs (I-JEPA, V-JEPA) relied on tricks (EMA teacher, distillation, moving averages) to prevent representational collapse
Only two loss terms — major simplification over the multi-objective stacks used in Dino-style SSL
Open source — code at lucas-maes/le-wm

Why This Matters

Key research question going forward: does LeWM's simplicity hold at V-JEPA 2 scale, or does the simplification break down at billion-parameter / million-hour-video regimes?

Notes

Recent paper (March 2026). Deeper read deferred — frontier-tracker candidate.

Source: LeWorldModel by Lucas Maes et al.

RELATED · IN THE BASE