view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 143
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 12 days ago • 59
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 12 days ago • 144
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 162
Black Swan (Abductive and Defeasible Reasoning) Collection Data for CVPR 2025 paper, "Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events" • 3 items • Updated 29 days ago • 2
MedSAM2: Segment Anything in 3D Medical Images and Videos Paper • 2504.03600 • Published 16 days ago • 8
MedSAM2 Collection MedSAM2: Segment Anything in 3D Medical Images and Videos • 4 items • Updated 8 days ago • 3
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Paper • 2503.23542 • Published 21 days ago • 10
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 17 days ago • 30
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 17 days ago • 55
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 12
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Paper • 2503.17032 • Published about 1 month ago • 25