Update README.md
Browse files
README.md
CHANGED
@@ -3,32 +3,45 @@ license: mit
|
|
3 |
datasets:
|
4 |
- GenPRM/GenPRM-MATH-Data
|
5 |
base_model:
|
6 |
-
- deepseek-ai/DeepSeek-R1-Distill-Qwen-
|
7 |
---
|
8 |
|
9 |
# Introduction
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
|
|
|
13 |
- As a verifier: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses Qwen2.5-Math-PRM-72B via test-time scaling.
|
14 |
- As a critic: GenPRM-7B demonstrates superior critique capabilities, achieving 3.4× greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
|
15 |
|
16 |

|
17 |
|
18 |
-
- Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM
|
19 |
- Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
|
20 |
- Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
|
|
|
|
|
|
|
21 |
|
22 |
# Model details
|
23 |
-
For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891)
|
24 |
-
- Training data: the 23K conversation data are released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
|
25 |
-
- Base model: we select the [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, 32B) as our base models
|
26 |
|
|
|
|
|
|
|
|
|
27 |
|
28 |
# How to use
|
29 |
-
The evaluation and testing code for GenPRM are available in our GitHub repository: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
|
30 |
|
31 |
-
|
|
|
|
|
|
|
32 |
```python
|
33 |
from transformers import AutoTokenizer
|
34 |
from vllm import LLM, SamplingParams
|
|
|
3 |
datasets:
|
4 |
- GenPRM/GenPRM-MATH-Data
|
5 |
base_model:
|
6 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
7 |
---
|
8 |
|
9 |
# Introduction
|
10 |
+
|
11 |
+
We propose GenPRM, a strong generative process reward model with the following features:
|
12 |
+
|
13 |
+
- reasoning with explicit CoT and code verfication before providing the process judgment;
|
14 |
+
- improving Monte Carlo estimation and hard label with Relative Progress Estimation (RPE);
|
15 |
+
- supporting GenPRM test-time scaling in a parallel manner with majority voting;
|
16 |
+
- supporting policy model test-time scaling with GenPRM as verifiers or critics.
|
17 |
|
18 |
GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
|
19 |
+
|
20 |
- As a verifier: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses Qwen2.5-Math-PRM-72B via test-time scaling.
|
21 |
- As a critic: GenPRM-7B demonstrates superior critique capabilities, achieving 3.4× greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
|
22 |
|
23 |

|
24 |
|
25 |
+
- Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM)
|
26 |
- Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
|
27 |
- Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
|
28 |
+
- Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
|
29 |
+
- HF Paper Link:[GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
|
30 |
+
- HF Collection:[GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
|
31 |
|
32 |
# Model details
|
|
|
|
|
|
|
33 |
|
34 |
+
For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891).
|
35 |
+
|
36 |
+
- Training data: 23K SFT data is released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
|
37 |
+
- Base model: we use [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, and 32B) as our base models.
|
38 |
|
39 |
# How to use
|
|
|
40 |
|
41 |
+
The evaluation code of GenPRM is available in our GitHub repository: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM).
|
42 |
+
|
43 |
+
Here's a minimal example of using GenPRM for rationale generation and process supervision:
|
44 |
+
|
45 |
```python
|
46 |
from transformers import AutoTokenizer
|
47 |
from vllm import LLM, SamplingParams
|