Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published Apr 14 • 27
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published Apr 17 • 18
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 3.7M • 585
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 53
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization Paper • 2504.12083 • Published Apr 16 • 4
Running 16 16 Leaderboard: Physical Reasoning from Video 🏃 Submit and evaluate model performance on video and text tasks