ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper β’ 2504.10514 β’ Published 8 days ago β’ 43
Cobra: Efficient Line Art COlorization with BRoAder References Paper β’ 2504.12240 β’ Published 2 days ago β’ 22
An Empirical Study of GPT-4o Image Generation Capabilities Paper β’ 2504.05979 β’ Published 10 days ago β’ 59
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper β’ 2504.02160 β’ Published 16 days ago β’ 33
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper β’ 2504.06263 β’ Published 10 days ago β’ 143
VACE Collection VACE: All-in-One Video Creation and Editing β’ 5 items β’ Updated 12 days ago β’ 12
Concept Lancet: Image Editing with Compositional Representation Transplant Paper β’ 2504.02828 β’ Published 15 days ago β’ 16
One-Minute Video Generation with Test-Time Training Paper β’ 2504.05298 β’ Published 11 days ago β’ 94
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Paper β’ 2504.02949 β’ Published 15 days ago β’ 19
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization Paper β’ 2504.03011 β’ Published 15 days ago β’ 9
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper β’ 2504.02542 β’ Published 15 days ago β’ 41
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper β’ 2504.02436 β’ Published 16 days ago β’ 35
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper β’ 2503.23377 β’ Published 20 days ago β’ 51
WikiVideo: Article Generation from Multiple Videos Paper β’ 2504.00939 β’ Published 17 days ago β’ 36
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper β’ 2504.01016 β’ Published 17 days ago β’ 28