DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation
Abstract
DreamScene is an end-to-end framework that generates high-quality, editable 3D scenes from text or dialogue, ensuring automation, 3D consistency, and fine-grained control through a combination of scene planning, graph-based placement, formation pattern sampling, and progressive camera sampling.
Generating 3D scenes from natural language holds great promise for applications in gaming, film, and design. However, existing methods struggle with automation, 3D consistency, and fine-grained control. We present DreamScene, an end-to-end framework for high-quality and editable 3D scene generation from text or dialogue. DreamScene begins with a scene planning module, where a GPT-4 agent infers object semantics and spatial constraints to construct a hybrid graph. A graph-based placement algorithm then produces a structured, collision-free layout. Based on this layout, Formation Pattern Sampling (FPS) generates object geometry using multi-timestep sampling and reconstructive optimization, enabling fast and realistic synthesis. To ensure global consistent, DreamScene employs a progressive camera sampling strategy tailored to both indoor and outdoor settings. Finally, the system supports fine-grained scene editing, including object movement, appearance changes, and 4D dynamic motion. Experiments demonstrate that DreamScene surpasses prior methods in quality, consistency, and flexibility, offering a practical solution for open-domain 3D content creation. Code and demos are available at https://jahnsonblack.github.io/DreamScene-Full/.
Community
DreamScene is an end-to-end framework for high-quality, consistent, and editable 3D scene generation from text. It is an extended version of our ECCV 2024 paper “DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling.”
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Video Perception Models for 3D Scene Synthesis (2025)
- LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization (2025)
- X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability (2025)
- RoomCraft: Controllable and Complete 3D Indoor Scene Generation (2025)
- Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning (2025)
- HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels (2025)
- CoCo4D: Comprehensive and Complex 4D Scene Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper