CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 5 days ago • 86
EMOVA-Datasets Collection A collection of EMOVA datasets (https://emova-ollm.github.io/) • 6 items • Updated Mar 14 • 2
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset Mar 11 • 76
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 51
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 Feb 18 • 98
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 62