FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 69
view article Article Introducing smolagents: simple agents that write actions in code. By m-ric and 2 others • Dec 31, 2024 • 1.11k
Running on CPU Upgrade 250 250 GPT-OSS-120B on AMD MI300X 💻 gpt-oss-120b model running on AMD MI300 infrastructure.
view post Post 3999 Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗 Trainable Dynamic Mask Sparse Attention (2508.02124) See translation 🤗 8 8 + Reply