TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published 5 days ago • 70
ReasoningTransferability/UniReason-Qwen3-14B-no-think-SFT Text Generation • 15B • Updated 4 days ago • 68 • 1
ReasoningTransferability/UniReason-Qwen3-14B-think-SFT Text Generation • 15B • Updated 4 days ago • 50
ReasoningTransferability/UniReason-Qwen3-14B-no-think-SFT Text Generation • 15B • Updated 4 days ago • 68 • 1
ReasoningTransferability/UniReason-Qwen3-14B-think-SFT Text Generation • 15B • Updated 4 days ago • 50
Evaluating Vision-Language Models as Evaluators in Path Planning Paper • 2411.18711 • Published Nov 27, 2024
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 24
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Paper • 2503.19877 • Published Mar 25 • 1
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Paper • 2504.10342 • Published Apr 14 • 11