Explain Before You Answer: A Survey on Compositional Visual Reasoning Paper • 2508.17298 • Published 2 days ago • 2
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published about 20 hours ago • 78
Do What? Teaching Vision-Language-Action Models to Reject the Impossible Paper • 2508.16292 • Published 4 days ago • 5
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass Paper • 2508.15769 • Published 5 days ago • 18
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published 6 days ago • 51
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published 7 days ago • 15
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 7 days ago • 55
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 20 days ago • 111
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published 8 days ago • 31
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Paper • 2508.12880 • Published 8 days ago • 43
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published 8 days ago • 22
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published 13 days ago • 51