Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 28
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention Aug 21, 2024 • 33
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published Nov 25, 2024 • 21
high-quality Chinese training datasets Collection a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated Mar 11 • 12
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 13
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 49