Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 6 days ago • 45
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 8 days ago • 39
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 12 days ago • 45
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 28 days ago • 49
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4 • 14
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 63
LLM as a Broken Telephone: Iterative Generation Distorts Information Paper • 2502.20258 • Published Feb 27 • 26
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom Paper • 2503.01836 • Published Mar 3 • 13
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 30
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published Feb 27 • 28
CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale Paper • 2502.16645 • Published Feb 23 • 22
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 155
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 40