zzfive
's Collections
RL+reason model
updated
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
28
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
123
MaxInfoRL: Boosting exploration in reinforcement learning through
information gain maximization
Paper
•
2412.12098
•
Published
•
5
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Paper
•
2412.09858
•
Published
•
2
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
61
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper
•
2501.18438
•
Published
•
24
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
125
Process Reinforcement through Implicit Rewards
Paper
•
2502.01456
•
Published
•
62
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
•
2502.01081
•
Published
•
14
Improving Transformer World Models for Data-Efficient RL
Paper
•
2502.01591
•
Published
•
9
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
•
2502.02508
•
Published
•
23
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
59
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
•
2502.02339
•
Published
•
22
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
•
2502.01618
•
Published
•
10
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
•
2502.03860
•
Published
•
25
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
24
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
154
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
•
2502.06772
•
Published
•
21
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
41
Teaching Language Models to Critique via Reinforcement Learning
Paper
•
2502.03492
•
Published
•
24
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in
One Day via Model Merging
Paper
•
2502.09056
•
Published
•
32
Logical Reasoning in Large Language Models: A Survey
Paper
•
2502.09100
•
Published
•
23
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
•
2502.08235
•
Published
•
59
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper
•
2502.11775
•
Published
•
9
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper
•
2502.12900
•
Published
•
86
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
•
2502.12215
•
Published
•
16
Small Models Struggle to Learn from Strong Reasoners
Paper
•
2502.12143
•
Published
•
39
Thinking Preference Optimization
Paper
•
2502.13173
•
Published
•
17
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
48
LightThinker: Thinking Step-by-Step Compression
Paper
•
2502.15589
•
Published
•
29
The Relationship Between Reasoning and Performance in Large Language
Models -- o3 (mini) Thinks Harder, Not Longer
Paper
•
2502.15631
•
Published
•
9
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
•
2502.16033
•
Published
•
18
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
•
2502.18449
•
Published
•
74
Self-rewarding correction for mathematical reasoning
Paper
•
2502.19613
•
Published
•
84
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language
Models (VLMs) via Reinforcement Learning
Paper
•
2502.19634
•
Published
•
64
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through
Reflective Puzzle Solving
Paper
•
2502.20238
•
Published
•
24
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
80
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper
•
2502.20545
•
Published
•
22
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
•
2503.01307
•
Published
•
39
Efficient Test-Time Scaling via Self-Calibration
Paper
•
2503.00031
•
Published
•
15
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper
•
2503.00735
•
Published
•
22
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
114
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
46
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
•
2503.05132
•
Published
•
58
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
27
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
•
2503.04808
•
Published
•
18
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Paper
•
2503.04548
•
Published
•
8
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
62
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large
Language Models
Paper
•
2503.06749
•
Published
•
31
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
•
2503.07536
•
Published
•
88
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
17
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
•
2503.09516
•
Published
•
35
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
52
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
37
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
•
2503.10460
•
Published
•
29
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
•
2503.10615
•
Published
•
17
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
•
2503.12605
•
Published
•
36
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
30
reWordBench: Benchmarking and Improving the Robustness of Reward Models
with Transformed Inputs
Paper
•
2503.11751
•
Published
•
16
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
137
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
•
2503.14495
•
Published
•
11
Towards Self-Improving Systematic Cognition for Next-Generation
Foundation MLLMs
Paper
•
2503.12303
•
Published
•
7
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
•
2503.15478
•
Published
•
13
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
76
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
24
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
•
2503.18878
•
Published
•
121
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for
Open Base Models in the Wild
Paper
•
2503.18892
•
Published
•
32
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models
via Vision-Guided Reinforcement Learning
Paper
•
2503.18013
•
Published
•
20
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Paper
•
2503.18071
•
Published
•
3
Inference-Time Scaling for Flow Models via Stochastic Generation and
Rollover Budget Forcing
Paper
•
2503.19385
•
Published
•
34
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
•
2503.19855
•
Published
•
29
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
•
2503.19470
•
Published
•
19
ViLBench: A Suite for Vision-Language Process Reward Modeling
Paper
•
2503.20271
•
Published
•
7
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
•
2503.21776
•
Published
•
80
OThink-MR1: Stimulating multimodal generalized reasoning capabilities
via dynamic reinforcement learning
Paper
•
2503.16081
•
Published
•
28
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
42
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
•
2503.22230
•
Published
•
46
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
63
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
55
Efficient Inference for Large Reasoning Models: A Survey
Paper
•
2503.23077
•
Published
•
47
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
•
2503.24376
•
Published
•
39
Z1: Efficient Test-time Scaling with Code
Paper
•
2504.00810
•
Published
•
27
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
•
2504.00883
•
Published
•
66
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
•
2503.20783
•
Published
•
55
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
57
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
•
2504.02587
•
Published
•
32
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
80
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning
Models
Paper
•
2504.04823
•
Published
•
31
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
25
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
•
2504.03151
•
Published
•
14
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
•
2504.06958
•
Published
•
11
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
86
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Paper
•
2504.07615
•
Published
•
32
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
•
2504.08837
•
Published
•
43
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
•
2504.09710
•
Published
•
19
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
•
2504.09641
•
Published
•
16
Reasoning Models Can Be Effective Without Thinking
Paper
•
2504.09858
•
Published
•
12
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
84
Efficient Reasoning Models: A Survey
Paper
•
2504.10903
•
Published
•
19
Efficient Process Reward Model Training via Active Learning
Paper
•
2504.10559
•
Published
•
13
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
19
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
61
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
•
2504.11468
•
Published
•
29
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
•
2504.13055
•
Published
•
19
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
133
Learning to Reason under Off-Policy Guidance
Paper
•
2504.14945
•
Published
•
86
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
•
2504.15257
•
Published
•
47
ToolRL: Reward is All Tool Learning Needs
Paper
•
2504.13958
•
Published
•
45
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
•
2504.14870
•
Published
•
33
TTRL: Test-Time Reinforcement Learning
Paper
•
2504.16084
•
Published
•
120
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
•
2504.15279
•
Published
•
75
Process Reward Models That Think
Paper
•
2504.16828
•
Published
•
16
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper
•
2504.16656
•
Published
•
58
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual
Dependency
Paper
•
2504.18589
•
Published
•
13
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
•
2504.20571
•
Published
•
97
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
•
2504.21776
•
Published
•
59
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in Math
Paper
•
2504.21233
•
Published
•
48
Phi-4-reasoning Technical Report
Paper
•
2504.21318
•
Published
•
52
100 Days After DeepSeek-R1: A Survey on Replication Studies and More
Directions for Reasoning Language Models
Paper
•
2505.00551
•
Published
•
37
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning
Optimization
Paper
•
2504.21659
•
Published
•
13
Llama-Nemotron: Efficient Reasoning Models
Paper
•
2505.00949
•
Published
•
42
RM-R1: Reward Modeling as Reasoning
Paper
•
2505.02387
•
Published
•
78
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization
in Rejection Sampling and RL
Paper
•
2505.02391
•
Published
•
25
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning
Paper
•
2505.02835
•
Published
•
27
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
•
2505.03318
•
Published
•
94
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
181
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
•
2505.04588
•
Published
•
66
Scalable Chain of Thoughts via Elastic Reasoning
Paper
•
2505.05315
•
Published
•
26
X-Reasoner: Towards Generalizable Reasoning Across Modalities and
Domains
Paper
•
2505.03981
•
Published
•
15
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
•
2505.07608
•
Published
•
81
DanceGRPO: Unleashing GRPO on Visual Generation
Paper
•
2505.07818
•
Published
•
31
Skywork-VL Reward: An Effective Reward Model for Multimodal
Understanding and Reasoning
Paper
•
2505.07263
•
Published
•
30
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Paper
•
2505.08311
•
Published
•
18
Bring Reason to Vision: Understanding Perception and Reasoning through
Model Merging
Paper
•
2505.05464
•
Published
•
11
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Paper
•
2505.09439
•
Published
•
9
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
•
2505.10554
•
Published
•
120
WorldPM: Scaling Human Preference Modeling
Paper
•
2505.10527
•
Published
•
34
AdaptThink: Reasoning Models Can Learn When to Think
Paper
•
2505.13417
•
Published
•
82
Thinkless: LLM Learns When to Think
Paper
•
2505.13379
•
Published
•
51
VisionReasoner: Unified Visual Perception and Reasoning via
Reinforcement Learning
Paper
•
2505.12081
•
Published
•
18
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced
Reinforcement Learning
Paper
•
2505.12996
•
Published
•
3
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Paper
•
2505.13438
•
Published
•
36
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via
Reinforcement Learning to Rank
Paper
•
2505.14460
•
Published
•
31
Paper
•
2505.14674
•
Published
•
36
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper
•
2505.15277
•
Published
•
103
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
•
2505.16410
•
Published
•
57
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
•
2505.15966
•
Published
•
53
GRIT: Teaching MLLMs to Think with Images
Paper
•
2505.15879
•
Published
•
12
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Paper
•
2505.17941
•
Published
•
25
Synthetic Data RL: Task Definition Is All You Need
Paper
•
2505.17063
•
Published
•
10
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
•
2505.18129
•
Published
•
60
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language
Model via Reinforcement Learning
Paper
•
2505.13426
•
Published
•
13
Active-O3: Empowering Multimodal Large Language Models with Active
Perception via GRPO
Paper
•
2505.21457
•
Published
•
14
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
127
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
•
2505.22453
•
Published
•
46
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Paper
•
2505.22651
•
Published
•
51
Skywork Open Reasoner 1 Technical Report
Paper
•
2505.22312
•
Published
•
55
Advancing Multimodal Reasoning via Reinforcement Learning with Cold
Start
Paper
•
2505.22334
•
Published
•
37
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for
Frozen LLMs
Paper
•
2505.19075
•
Published
•
21
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement
Learning
Paper
•
2505.14362
•
Published
•
2
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
•
2505.23621
•
Published
•
94
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation
with Reinforcement Learning
Paper
•
2505.17022
•
Published
•
27
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper
•
2504.19394
•
Published
•
14
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
133
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Paper
•
2505.24025
•
Published
•
27
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement
Learning
Paper
•
2505.24871
•
Published
•
22
Harnessing Negative Signals: Reinforcement Distillation from Teacher
Data for LLM Reasoning
Paper
•
2505.24850
•
Published
•
9
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
176
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware
Reinforcement Learning
Paper
•
2506.01713
•
Published
•
47
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for
Language Reasoning
Paper
•
2505.24298
•
Published
•
27
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
267
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for
Over-Reasoning Mitigation
Paper
•
2506.02397
•
Published
•
36
OpenThoughts: Data Recipes for Reasoning Models
Paper
•
2506.04178
•
Published
•
43
Critique-GRPO: Advancing LLM Reasoning with Natural Language and
Numerical Feedback
Paper
•
2506.03106
•
Published
•
6
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
252
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
•
2506.06395
•
Published
•
128
Paper
•
2506.10910
•
Published
•
62
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
•
2506.13585
•
Published
•
260
VGR: Visual Grounded Reasoning
Paper
•
2506.11991
•
Published
•
20
A Technical Study into Small Reasoning Language Models
Paper
•
2506.13404
•
Published
•
9
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
•
2506.14245
•
Published
•
39
Truncated Proximal Policy Optimization
Paper
•
2506.15050
•
Published
•
11
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain
Perspective
Paper
•
2506.14965
•
Published
•
49
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
•
2506.18254
•
Published
•
32
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
•
2506.18896
•
Published
•
28
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal
Reasoning
Paper
•
2506.16141
•
Published
•
27
SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning
Paper
•
2506.19767
•
Published
•
13
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper
•
2506.22434
•
Published
•
10
Jan-nano Technical Report
Paper
•
2506.22760
•
Published
•
9
Listener-Rewarded Thinking in VLMs for Image Preferences
Paper
•
2506.22832
•
Published
•
23
Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in
Inference-time Scaling?
Paper
•
2506.17417
•
Published
•
11
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language
Models for Audio Generation and Editing
Paper
•
2506.21448
•
Published
•
7
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
•
2507.01006
•
Published
•
207
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
Paper
•
2506.21277
•
Published
•
15
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
•
2506.23918
•
Published
•
84
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper
•
2507.01352
•
Published
•
51
Energy-Based Transformers are Scalable Learners and Thinkers
Paper
•
2507.02092
•
Published
•
57
A Survey on Latent Reasoning
Paper
•
2507.06203
•
Published
•
85
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based
Reinforcement Learning
Paper
•
2507.05920
•
Published
•
11
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
•
2507.06448
•
Published
•
44
First Return, Entropy-Eliciting Explore
Paper
•
2507.07017
•
Published
•
23
Scaling RL to Long Videos
Paper
•
2507.07966
•
Published
•
151
PyVision: Agentic Vision with Dynamic Tooling
Paper
•
2507.07998
•
Published
•
30
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
•
2507.05255
•
Published
•
68
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
•
2507.06261
•
Published
•
57
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and
Reasoning Modes
Paper
•
2507.11407
•
Published
•
51
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper
•
2507.14843
•
Published
•
82
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for
RLVR
Paper
•
2507.15778
•
Published
•
19
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Paper
•
2507.14295
•
Published
•
13
Can One Domain Help Others? A Data-Centric Study on Multi-Domain
Reasoning via Reinforcement Learning
Paper
•
2507.17512
•
Published
•
34
Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
256
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
111
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
Large Language Models' Reasoning Abilities
Paper
•
2507.19766
•
Published
•
10
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Paper
•
2507.16806
•
Published
•
6