Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Michael109 's Collections
3D construction
RL

RL

updated May 25
Upvote
-

  • CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

    Paper • 2505.12504 • Published May 18 • 24

  • Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

    Paper • 2505.15277 • Published May 21 • 103

  • T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

    Paper • 2505.00703 • Published May 1 • 45

  • OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

    Paper • 2505.08617 • Published May 13 • 42

  • Optimizing Anytime Reasoning via Budget Relative Policy Optimization

    Paper • 2505.13438 • Published May 19 • 36

  • DanceGRPO: Unleashing GRPO on Visual Generation

    Paper • 2505.07818 • Published May 12 • 31

  • R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

    Paper • 2505.02835 • Published May 5 • 27

  • TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

    Paper • 2505.14625 • Published May 20 • 13
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs