SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Paper • 2504.09081 • Published 10 days ago • 16
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 7 days ago • 56
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published 8 days ago • 32
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 8 days ago • 82
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 34
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 34