VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 48
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper • 2501.03841 • Published Jan 7 • 56
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 28
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Paper • 2505.09601 • Published May 14 • 5
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published 23 days ago • 23
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 12 days ago • 38
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published 16 days ago • 71