JournalistsonHF (Journalists on Hugging Face)

fdaudens

posted an update 16 days ago

Post

5769

Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers

2 replies

·

fdaudens

posted an update 18 days ago

Post

480

What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

fdaudens/gpt-oss-news-agent

2 replies

·

fdaudens

posted an update 19 days ago

Post

3352

OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀

1 reply

·

Kseniase

posted an update 20 days ago

Post

3466

6 Must-read books about AI and Machine Learning:

Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning:

1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/
Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles

2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162
Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap.

3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478
Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones

4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723
Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc.

5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book
Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects.

6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/
Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

·

yjernite

authored 3 papers 22 days ago

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Paper • 2406.16746 • Published Jun 24, 2024

In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI

Paper • 2503.16861 • Published Mar 21 • 1

A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety

Paper • 2506.22183 • Published Jun 27

clem

posted an update 23 days ago

Post

3415

Thread to gossip during the

openai GPT-5 livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfo. Feel free to post your impressions below!

29 replies

·

fdaudens

posted an update 25 days ago

Post

2623

Well, it took just 2 hours for openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!

1 reply

·

Kseniase

posted an update 27 days ago

Post

3403

12 Powerful World Models

World models are one of the most challenging areas in AI, pushing the boundaries of reasoning, perception, and planning. They're gen AI systems that help models and agents learn internal representations of real-world environments.

Today, we invite you to take a look at 12 standout examples:

1. WorldVLA → WorldVLA: Towards Autoregressive Action World Model (2506.21539)
This autoregressive world model integrates action prediction and visual world modeling in a single framework, allowing each to enhance the other. It introduces an attention masking strategy to reduce action prediction errors

2. SimuRA → https://arxiv.org/abs/2507.23773
A generalized world model that uses a language-based world model to simulate and plan actions before execution, enabling more general and flexible reasoning

3. PAN (Physical, Agentic, and Nested) world models → Critiques of World Models (2507.05169)
Has a hybrid architecture that combines discrete concept-based reasoning (via LLMs) with continuous perceptual simulation (via diffusion models), enabling rich multi-level, multimodal understanding and prediction

4. MineWorld by Microsoft Research → MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft (2504.08388)
Enables real-time, interactive world modeling in Minecraft by combining visual and action tokenization within an autoregressive Transformer. It uses parallel decoding for fast scene generation (4–7 FPS)

5. WorldMem → WORLDMEM: Long-term Consistent World Simulation with Memory (2504.12369)
Uses a memory bank with attention over time-stamped frames and states to maintain long-term and 3D spatial consistency in scene generation. So it reconstruct past scenes and simulate dynamic world changes across large temporal gaps

Read further below ⬇️

If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

Plus explore this article for a comprehensive overview of the history and current evolution of world models: https://www.turingpost.com/p/topic-35-what-are-world-models

1 reply

·

jsulz

posted an update about 1 month ago

Post

3045

We've crossed 1 million repositories backed by Xet storage on Hugging Face! 🚀🚀🚀

You can follow along our progress converting the Hub from Git LFS to Xet at jsulz/ready-xet-go

We have a lot of repos left to migrate, which means I have plenty of time to add more animations 🤪

yjernite

posted an update about 1 month ago

Post

4077

𝗙𝗶𝗿𝘀𝘁 𝗚𝗣𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗘𝗨 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲? 🇪🇺

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀

In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency

Kseniase

posted an update about 1 month ago

Post

4976

9 new policy optimization techniques

Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.

Here are 9 fresh policy optimization techniques worth knowing:

1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071)
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.

2. LAPO: Length-Adaptive Policy Optimization → LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization (2507.15758)
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.

3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844)
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.

4. SOPHIA: Semi-off-policy reinforcement learning → Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning (2507.16814)
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.

5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340)
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt

Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update about 1 month ago

Post

6207

6 Essential Reads on core AI/ML topics:

Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:

1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference

2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072)
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques

3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.

4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334)
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems

5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges

6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

fdaudens

posted an update about 1 month ago

Post

2225

AudioRAG is becoming real! Just built a demo with ColQwen-Omni that does semantic search on raw audio, no transcription needed.

Drop in a podcast, ask your question, and it finds the exact chunks where it happens. You can also get a written answer.

What’s exciting: it skips transcription, making it faster and better at capturing emotion, ambient sound, and tone, surfacing results text search would miss.

- Demo: fdaudens/colqwen-omni-demo
- Blog post from ColQwen team: https://huggingface.co/blog/manu/colqwen-omni-omnimodal-retrieval

1 reply

·

jsulz

posted an update about 2 months ago

Post

3003

We've moved over 20PB from Git LFS to Xet on the Hub without downtime or data loss. Having things "just work" on a migration of this scale is about as good as it gets.

Now, we're migrating the rest of the Hub https://huggingface.co/blog/migrating-the-hub-to-xet

But how did we get here?

In the early days of joining Hugging Face, we made a few key design decisions:
* There would be no "hard cut-over" from Git LFS to Xet
* A Xet-enabled repository should be able to contain both Xet and LFS files
* Repository migrations from LFS to Xet can run in the background without disrupting downloads or uploads

These were largely driven by our desire to ensure the community could keep working without interruption.

We cover the infrastructure making this all go in this post, specifically:
* An integral piece of infrastructure known internally as the Git LFS Bridge
* Background content migrations that run around the clock

To skip the wait and join Xet now, sign up here https://huggingface.co/join/xet

fdaudens

posted an update about 2 months ago

Post

2571

You might not have heard of Moonshot AI — but within 24 hours, their new model Kimi K2 shot to the top of Hugging Face’s trending leaderboard.

So… who are they, and why does it matter?

Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.

🧵 A few standout facts:

1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.

2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI — still a rare ambition among Chinese AI labs.

3. A trillion-parameter model that’s surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.

4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.

Most importantly, their move from closed to open source signals a broader shift in China’s AI scene — following Baidu’s pivot. But as Yang puts it: “Users are the only real leaderboard.”

👇 Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained

evijit

posted an update about 2 months ago

Post

299

New blog post alert! "What is the Hugging Face Community Building?", with @yjernite and @irenesolaiman

What 1.8 Million Models Reveal About Open Source Innovation: Our latest deep dive into the Hugging Face Hub reveals patterns that challenge conventional AI narratives:

🔗 Models become platforms for innovation Qwen, Llama, and Gemma models have spawned entire ecosystems of specialized variants. Looking at derivative works shows community adoption better than any single metric.

📊 Datasets reveal the foundation layer → Most downloaded datasets are evaluation benchmarks (MMLU, Squad, GLUE) → Universities and research institutions dominate foundational data → Domain-specific datasets thrive across finance, healthcare, robotics, and science → Open actors provide the datasets that power most AI development

🏛️ Research institutions lead the charge: AI2 (Allen Institute) emerges as one of the most active contributors, alongside significant activity from IBM, NVIDIA, and international organizations. The open source ecosystem spans far beyond Big Tech.

🔍 Interactive exploration tools: We've built several tools to help you discover patterns!

ModelVerse Explorer - organizational contributions
DataVerse Explorer - dataset patterns
Organization HeatMap - activity over time
Base Model Explorer - model family trees
Semantic Search - find models by capability

📚 Academic research is thriving: Researchers are already producing valuable insights, including recent work at FAccT 2025: "The Brief and Wondrous Life of Open Models." We've also made hub datasets, weekly snapshots, and other data available for your own analysis.

The bottom line: AI development is far more distributed, diverse, and collaborative than popular narratives suggest. Real innovation happens through community collaboration across specialized domains.

Read: https://huggingface.co/blog/evijit/hf-hub-ecosystem-overview

fdaudens

posted an update about 2 months ago

Post

308

AI is reshaping everything—how we work, how we feel, even how nations compete.

Today’s reads cut across power, emotion, and disruption.

Here’s what stood out and why it matters 👇

AI might “solve” loneliness, but this could be a problem, as the discomfort of loneliness shapes us in important ways. 💔 https://t.co/k2Q9le6G0P

A new study warns of significant risks in using AI therapy chatbots, highlighting issues like stigmatization and inappropriate responses. 🤖 https://t.co/EFyW0RbYVl

AI is already showing signs of slashing job openings in the UK, particularly in roles exposed to the technology, suggesting a labor market slowdown. 📉 https://t.co/hhs0BbqIMa

AI firms like OpenAI are poaching Wall Street quants with massive paydays, shifting the talent landscape for building artificial general intelligence. 💰 https://www.businessinsider.com/ai-talent-openai-wall-street-quant-trading-firms-2025-7

Speaking of which: Nvidia CEO Jensen Huang disagrees with Anthropic CEO Dario Amodei on whether AI will create more jobs—or trigger a “white-collar apocalypse.” Huang believes AI will create vastly more, and better, jobs. ⚔️ https://t.co/YHWhY7qvSq

Can Nvidia convince governments to pay for “sovereign AI”? Politicians are warming to the idea of national AI systems, but it might not reduce dependence on US tech. 🌍 https://t.co/htQDzJAIDu

Kseniase

posted an update about 2 months ago

Post

5135

13 New types of LoRA

LoRA (Low-Rank Adaptation) is a popular lightweight method for fine-tuning AI models. It doesn't update the full model, it adds small trainable components, low-rank matrices, while keeping the original weights frozen. Only these adapters are trained.

Recently, many interesting new LoRA variations came out, so it’s a great time to take a look at these 13 clever approaches:

1. T-LoRA → T-LoRA: Single Image Diffusion Model Customization Without Overfitting (2507.05964)
A timestep-dependent LoRA method for adapting diffusion models with a single image. It dynamically adjusts updates and uses orthogonal initialization to reduce overlap, achieving better fidelity–alignment balance than standard LoRA

2. SingLoRA → SingLoRA: Low Rank Adaptation Using a Single Matrix (2507.05566)
Simplifies LoRA by using only one small matrix instead of usual two, and multiplying it by its own transpose (like A × Aᵀ). It uses half the parameters of LoRA and avoids scale mismatch between different matrices

3. LiON-LoRA → LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion (2507.05678)
Improves control and precision in video diffusion models when training data is limited. It builds on LoRA, adding 3 key principles: linear scalability, orthogonality, and norm consistency. A controllable token and modified self-attention enables smooth adjustment of motion

4. LoRA-Mixer → LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing (2507.00029)
Combines LoRA and mixture-of-experts (MoE) to adapt LLMs for multiple tasks. It dynamically routes task-specific LoRA experts into linear projections of attention modules, supporting both joint training and frozen expert reuse

5. QR-LoRA → QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation (2507.04599)
Separates content and style when combining multiple LoRA adapters. It implements QR decomposition to structure parameter updates, where the orthogonal Q matrix reduces interference between features, and the R matrix captures specific transformations

Read further in the comments 👇

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

AI & ML interests

Team members 410

JournalistsonHF's activity