You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published Feb 13 • 37
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 56
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published 9 days ago • 20
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 9 days ago • 239
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published Feb 27 • 28
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published Feb 11 • 44
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6 • 37
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper • 2502.01105 • Published Feb 3 • 20
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Paper • 2502.01639 • Published Feb 3 • 25