DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 120
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 52
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Paper • 2503.08525 • Published Mar 11 • 16
Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora Paper • 2401.14624 • Published Jan 26, 2024 • 1