VideoChat-R1 Collection VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning • 3 items • Updated about 16 hours ago • 2
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 4 days ago • 157
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models Paper • 2504.02882 • Published 21 days ago • 6
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 15 days ago • 168
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 22 days ago • 21
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 19 days ago • 53
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 21 days ago • 62
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 19 days ago • 30
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 22 days ago • 38
Self-Consistency Improves Chain of Thought Reasoning in Language Models Paper • 2203.11171 • Published Mar 21, 2022 • 5
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published 22 days ago • 15
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published 21 days ago • 15
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published 21 days ago • 21
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 21 days ago • 35
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 22 days ago • 62