-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2507.08801
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 98 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 26 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
From Slow Bidirectional to Fast Causal Video Generators
Paper • 2412.07772 • Published • 1 -
Navigation World Models
Paper • 2412.03572 • Published • 2 -
MAGI-1: Autoregressive Video Generation at Scale
Paper • 2505.13211 • Published • 2 -
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Paper • 2507.08801 • Published • 29
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 772 • 94 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 96 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
From Slow Bidirectional to Fast Causal Video Generators
Paper • 2412.07772 • Published • 1 -
Navigation World Models
Paper • 2412.03572 • Published • 2 -
MAGI-1: Autoregressive Video Generation at Scale
Paper • 2505.13211 • Published • 2 -
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Paper • 2507.08801 • Published • 29
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 98 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 26 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 772 • 94 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 96 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7