OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents Paper • 2506.14866 • Published Jun 17 • 6
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents Paper • 2506.14866 • Published Jun 17 • 6 • 2
Running on CPU Upgrade 13.5k 13.5k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots