Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published 11 days ago • 11 • 2
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences Paper • 2502.01126 • Published Feb 3 • 4 • 2