Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published Apr 28 • 39
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 94
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset Paper • 2505.21297 • Published May 27 • 30