SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation Paper • 2503.09641 • Published Mar 12 • 36
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Paper • 2503.10391 • Published Mar 13 • 10
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Paper • 2503.10437 • Published Mar 13 • 31
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Paper • 2503.09642 • Published Mar 12 • 17
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 27
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 16
Quantization for OpenAI's Whisper Models: A Comparative Analysis Paper • 2503.09905 • Published Mar 12 • 6
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation Paper • 2503.10636 • Published Mar 13 • 3
Autoregressive Image Generation with Randomized Parallel Decoding Paper • 2503.10568 • Published Mar 13 • 8
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 48
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation Paper • 2503.10630 • Published Mar 13 • 6
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 22
New Trends for Modern Machine Translation with Large Reasoning Models Paper • 2503.10351 • Published Mar 13 • 22
CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Paper • 2503.10613 • Published Mar 13 • 77
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 50
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models Paper • 2503.09669 • Published Mar 12 • 35