InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 40
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 128
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published Feb 22 • 18
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 64
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 62
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9 • 31
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 89
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 44
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 26
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published Mar 7 • 39
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 17
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 36
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 30
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 32
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 24
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems Paper • 2503.16549 • Published Mar 19 • 15
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published Mar 23 • 20
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 63
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning Paper • 2503.16081 • Published Mar 20 • 28
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 32
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Paper • 2504.03151 • Published Apr 4 • 14
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published Apr 8 • 86
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published Apr 9 • 11
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10 • 43
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published Apr 13 • 16
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search Paper • 2504.09130 • Published Apr 12 • 12
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published Apr 17 • 19
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper • 2504.14239 • Published Apr 19 • 14
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Paper • 2504.16656 • Published Apr 23 • 58
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 94
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 184
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published May 6 • 15
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12 • 30
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM? Paper • 2505.09439 • Published May 14 • 9
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published May 13 • 42
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16 • 61
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17 • 18
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Paper • 2505.14460 • Published May 20 • 31
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Paper • 2505.14677 • Published May 20 • 15
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 53
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 53
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Paper • 2505.17022 • Published May 22 • 27
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published May 22 • 15
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models Paper • 2505.16854 • Published May 22 • 11
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information Paper • 2505.13237 • Published May 19 • 1
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Paper • 2505.16192 • Published May 22 • 12
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration Paper • 2505.20256 • Published May 26 • 17
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Paper • 2505.13426 • Published May 19 • 13
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Paper • 2505.15804 • Published May 21 • 10
Jodi: Unification of Visual Generation and Understanding via Joint Modeling Paper • 2505.19084 • Published May 25 • 20
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25 • 43
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Paper • 2505.21374 • Published May 27 • 27
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published May 27 • 14
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published May 23 • 20
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO Paper • 2505.16673 • Published May 22 • 2
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published May 28 • 51
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28 • 46
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper • 2505.22334 • Published May 28 • 37
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published May 29 • 23
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning Paper • 2505.22914 • Published May 28 • 35
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models Paper • 2505.21523 • Published May 23 • 14
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 47
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published Jun 4 • 46
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5 • 20
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Paper • 2506.04559 • Published Jun 5 • 2
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 16
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 53
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Paper • 2506.07464 • Published Jun 9 • 14
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published Jun 16 • 44
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs Paper • 2506.16962 • Published Jun 20 • 10
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Paper • 2506.16141 • Published Jun 19 • 27
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Paper • 2506.21448 • Published Jun 26 • 8
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 229
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26 • 15
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 86
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published Jul 8 • 11
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8 • 45
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 71
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 72
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published about 1 month ago • 33
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published about 1 month ago • 37
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Paper • 2507.16814 • Published about 1 month ago • 21
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published 23 days ago • 45
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 51
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 11 days ago • 38
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published 8 days ago • 141
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs Paper • 2508.10576 • Published 8 days ago • 11