InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 9 days ago • 239
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published 13 days ago • 42
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 14 days ago • 73
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 15 days ago • 81
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 16 days ago • 169
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 13 days ago • 27
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published 27 days ago • 39
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published 22 days ago • 15
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 22 days ago • 62
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 20 days ago • 30
Where do Large Vision-Language Models Look at when Answering Questions? Paper • 2503.13891 • Published Mar 18 • 8
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias Paper • 2503.13834 • Published Mar 18 • 5
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 28
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 663