NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published 4 days ago • 16
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Paper • 2406.06565 • Published Jun 3, 2024 • 9