MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding Paper • 2507.12463 • Published Jul 16 • 26
MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding Paper • 2507.12463 • Published Jul 16 • 26
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction Paper • 2505.20279 • Published May 26 • 4
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 20
InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds Paper • 2403.20309 • Published Mar 29, 2024 • 19