Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 21 days ago • 53
DNA-R1 Collection Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets. • 1 item • Updated 1 day ago • 2
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 8 days ago • 131
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143