47 98 463

ldwang

ftgreat

AI & ML interests

LLM, MLLM, Infra

Recent Activity

upvoted a paper about 8 hours ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

liked a model about 8 hours ago

agents-course/notebooks

liked a dataset 3 days ago

nvidia/Nemotron-Pretraining-SFT-v1

View all activity

Organizations

upvoted a paper about 8 hours ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 69

liked a model about 8 hours ago

agents-course/notebooks

Updated 15 days ago • 462

liked 2 datasets 3 days ago

nvidia/Nemotron-Pretraining-SFT-v1

Viewer • Updated 3 days ago • 358M • 205 • 6

nvidia/Nemotron-CC-v2

Viewer • Updated 3 days ago • 5.81B • 177 • 26

liked a model 4 days ago

arcee-ai/AFM-4.5B

Text Generation • 5B • Updated 3 days ago • 3.6k • 64

liked a dataset 10 days ago

BAAI/ShareRobot

Viewer • Updated about 22 hours ago • 6.45k • 13.6k • 22

liked a dataset 12 days ago

ByteDance-Seed/mga-fineweb-edu

Viewer • Updated May 19 • 846M • 2.72k • 32

liked a model 13 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • 2B • Updated Apr 8 • 73k • 534

liked a Space 13 days ago

137

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents

updated a collection 14 days ago

MiscAgentic

Collection

3 items • Updated 14 days ago • 1

upvoted a collection 14 days ago

MiscAgentic

Collection

3 items • Updated 14 days ago • 1

liked a dataset 14 days ago

smolagents/benchmark-v1

Viewer • Updated Mar 4 • 132 • 391 • 15

upvoted a paper 14 days ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 164

upvoted an article 15 days ago

Article

Introducing smolagents: simple agents that write actions in code.

and 2 others •

Dec 31, 2024

• 1.11k

authored a paper 17 days ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published 19 days ago • 15

liked a dataset 17 days ago

princeton-nlp/SWE-bench

Viewer • Updated Mar 3 • 21.5k • 18.9k • 120

liked a Space 17 days ago

250

GPT-OSS-120B on AMD MI300X

💻

gpt-oss-120b model running on AMD MI300 infrastructure.

liked a dataset 17 days ago

HuggingFaceH4/Multilingual-Thinking

Viewer • Updated 16 days ago • 1k • 15.3k • 64

reacted to JingzeShi's post with 🤗 17 days ago

Post

3999

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

liked a model 17 days ago

openai/gpt-oss-20b

Text Generation • 22B • Updated 9 days ago • 6.35M • • 3.2k