Skywork
/

Skywork-Critic-Llama-3.1-70B

Text Generation

Model card Files Files and versions Community

zhao1iang commited on Sep 29, 2024

Commit

6afac55

·

verified ·

1 Parent(s): 7bb9ab8

Update README.md

Files changed (1) hide show

README.md +0 -3

README.md CHANGED Viewed

@@ -40,7 +40,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
 | **Skywork-Critic-Llama3.1-70B**  *      | **96.6**  |   **87.9**    |  **93.1**  |   **95.5**    | **93.3**  |
 | Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 91.6 | 97.6    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
-| **Skywork-Critic-Llama3.1-70B**  #      | **94.4**  |   **82.9**    |  **89.7**  |   **90.2**    | **89.3**  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
@@ -52,8 +51,6 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
 | meta-llama/Meta-Llama-3.1-70B-Instruct *       | 97.2 |   70.2    |  82.8  |   86.0    | 84.0  |
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
-For the Skywork-Critic-Llama3.1-70B model, we tested two types of prompts. The first simply asks the model to determine whether the response from model A or B is better, while the second prompt, using # to indicate this prompt, requires the model not only to choose the better response but also to provide specific reasoning. Surprisingly, the first approach yielded higher accuracy. Accurately generating critique explanations remains a challenge for the critic model and will be a key focus of our future research.
 # Demo Code
 Below are two examples of how to use the Skywork Critic model: as a preference data selector, and as a judge to generate scores and rationales for instruction-response pairs.

 | **Skywork-Critic-Llama3.1-70B**  *      | **96.6**  |   **87.9**    |  **93.1**  |   **95.5**    | **93.3**  |
 | Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 91.6 | 97.6    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | facebook/Self-taught-Llama-3-70B  | 96.9  |   84.0    |  91.1  |   82.5    | 88.6  |
 | meta-llama/Meta-Llama-3.1-70B-Instruct *       | 97.2 |   70.2    |  82.8  |   86.0    | 84.0  |
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
 # Demo Code
 Below are two examples of how to use the Skywork Critic model: as a preference data selector, and as a judge to generate scores and rationales for instruction-response pairs.