Jack Cloudman

JackCloudman

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

JackCloudman's activity

New activity in TheDrummer/Rivermind-12B-v1 7 days ago

Fantastic model!

1
#3 opened 7 days ago by
JackCloudman
reacted to danielhanchen's post with πŸ€— 15 days ago
reacted to abidlabs's post with ❀️ 18 days ago
view post
Post
3466
JOURNEY TO 1 MILLION DEVELOPERS

5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.

Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobabooga’s Text WebUI, Dall-E Mini, and LLaMA-Factory.

How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:

1. Invest in good primitives, not high-level abstractions
2. Embed virality directly into your library
3. Focus on a (growing) niche
4. Your only roadmap should be rapid iteration
5. Maximize ways users can consume your library's outputs

1. Invest in good primitives, not high-level abstractions

When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:

Read the rest here: https://x.com/abidlabs/status/1907886
reacted to m-ric's post with ❀️ 20 days ago
view post
Post
2247
πŸš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. πŸ“š

πŸ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
πŸ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
πŸ“‘ Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasksβ€”compared to 76,000 tasks for larger models like OS-Atlasβ€”UI-R1 shows significant efficiency and improved performance:
πŸ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
πŸ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here πŸ‘‰ UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)