Jiapeng Luo

woolpeeker
·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

woolpeeker's activity

upvoted an article 10 days ago
view article
Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

By NormalUhr
21
view reply

Great tutorial

image.png
about this equation, should the step-wise reward inside the sum of gradient, so the gradient of each step can multiply its reward?

updated a Space almost 2 years ago
updated a model almost 2 years ago