Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published 2 days ago • 35
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published 6 days ago • 17
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published 6 days ago • 94
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? Paper • 2504.09702 • Published 11 days ago • 17
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 9 days ago • 58
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 14 days ago • 61
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published 15 days ago • 39
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 9 days ago • 39
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 10 days ago • 84
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published 13 days ago • 27
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 22 days ago • 82
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 15 days ago • 73
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 21 days ago • 53
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 22 days ago • 36
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published 23 days ago • 40
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published 29 days ago • 45