JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community Paper • 2503.21679 • Published 24 days ago • 1
ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8 Reinforcement Learning • Updated 23 days ago • 24.3k • 148
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16 • 24
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10 • 21