cybershiptrooper/1p_max_8B-continuous-RM-n_examples_1000-probe_linear_layers_10 Text Generation • 8B • Updated May 22 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_50_no_KL Text Generation • 8B • Updated about 1 month ago • 121
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_100_no_KL Text Generation • 8B • Updated 30 days ago • 7
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_500_no_KL Text Generation • 8B • Updated 30 days ago • 7