davidoj01/unsloth-phi-4-Instruct-LORA-Open-R1-Code-GRPO-b2-as4-t07-lr1en5 Text Generation • Updated Apr 9 • 4
AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward-3k Text Generation • 2B • Updated 25 days ago • 35 • 1