zhao1iang commited on
Commit
6afac55
·
verified ·
1 Parent(s): 7bb9ab8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -3
README.md CHANGED
@@ -40,7 +40,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
40
  | **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
41
  | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
42
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
43
- | **Skywork-Critic-Llama3.1-70B** # | **94.4** | **82.9** | **89.7** | **90.2** | **89.3** |
44
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
45
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
46
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
@@ -52,8 +51,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
52
  | meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
53
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
54
 
55
- For the Skywork-Critic-Llama3.1-70B model, we tested two types of prompts. The first simply asks the model to determine whether the response from model A or B is better, while the second prompt, using # to indicate this prompt, requires the model not only to choose the better response but also to provide specific reasoning. Surprisingly, the first approach yielded higher accuracy. Accurately generating critique explanations remains a challenge for the critic model and will be a key focus of our future research.
56
-
57
  # Demo Code
58
  Below are two examples of how to use the Skywork Critic model: as a preference data selector, and as a judge to generate scores and rationales for instruction-response pairs.
59
 
 
40
  | **Skywork-Critic-Llama3.1-70B** * | **96.6** | **87.9** | **93.1** | **95.5** | **93.3** |
41
  | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
42
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
 
43
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
44
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
45
  | facebook/Self-taught-Llama-3-70B | 96.9 | 84.0 | 91.1 | 82.5 | 88.6 |
 
51
  | meta-llama/Meta-Llama-3.1-70B-Instruct * | 97.2 | 70.2 | 82.8 | 86.0 | 84.0 |
52
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
53
 
 
 
54
  # Demo Code
55
  Below are two examples of how to use the Skywork Critic model: as a preference data selector, and as a judge to generate scores and rationales for instruction-response pairs.
56