WORLDMEM: Long-term Consistent World Simulation with Memory
Abstract
World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WorldMem, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing a memory attention mechanism that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps. Furthermore, by incorporating timestamps into the states, our framework not only models a static world but also captures its dynamic evolution over time, enabling both perception and interaction within the simulated world. Extensive experiments in both virtual and real scenarios validate the effectiveness of our approach.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CamContextI2V: Context-aware Controllable Video Generation (2025)
- MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving (2025)
- Long-Context Autoregressive Video Modeling with Next-Frame Prediction (2025)
- Generating Multimodal Driving Scenes via Next-Scene Prediction (2025)
- TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation (2025)
- EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation (2025)
- Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper