Efficient-Large-Model/Sana_Sprint_1.6B_1024px_teacher Text-to-Image β’ Updated 22 days ago β’ 11 β’ 1
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper β’ 2504.08837 β’ Published 12 days ago β’ 42
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper β’ 2504.08736 β’ Published 11 days ago β’ 47
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper β’ 2504.00595 β’ Published 21 days ago β’ 35
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper β’ 2503.21144 β’ Published 27 days ago β’ 25
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Paper β’ 2503.16418 β’ Published Mar 20 β’ 35