Submitted by xhyandwyy 44 Mobile-Agent-v3: Foundamental Agents for GUI Automation · 15 authors 4.78k 3
Submitted by Kevin355 33 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries · 14 authors 5
Submitted by haoningwu 13 SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass · 4 authors 33 2
Submitted by universea 11 aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists · 23 authors 2
Submitted by taesiri 8 ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling · 10 authors 2
Submitted by cai-qi 7 Visual Autoregressive Modeling for Instruction-Guided Image Editing · 8 authors 13 2
Submitted by taesiri 5 "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries · 10 authors 2
Submitted by thewhole 4 Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds · 9 authors 2
Submitted by taesiri 2 When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding · 3 authors 2
Submitted by YirongSun 2 LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model · 8 authors 15 2
Submitted by amazingj 2 Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models · 7 authors 2