Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
hhenryz 's Collections
Personal Interest

Personal Interest

updated Feb 25
Upvote
-

  • MLGym: A New Framework and Benchmark for Advancing AI Research Agents

    Paper • 2502.14499 • Published Feb 20 • 193

  • Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

    Paper • 2501.05444 • Published Jan 9 • 3

  • Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models

    Paper • 2502.14191 • Published Feb 20 • 7

  • CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

    Paper • 2502.16614 • Published Feb 23 • 27
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs