Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24 • 2
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 106
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 38
Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06 Updated May 29, 2024 • 1
Holarissun/REPROD_dpo_harmlessharmless_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06 Updated May 29, 2024 • 1
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06 Updated May 29, 2024 • 1
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06 Updated May 29, 2024 • 1
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06 Updated May 28, 2024 • 1
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06 Updated May 28, 2024 • 1
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-05 Updated May 25, 2024 • 1
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-05 Updated May 24, 2024 • 2
Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05 Updated May 24, 2024 • 1
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-06 Updated May 24, 2024 • 1
Holarissun/REPROD_dpo_helpfulhelpful_gpt3_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05 Updated May 24, 2024 • 1 • 1
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05 Updated May 24, 2024 • 1