Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: text-generation
|
|
21 |
|
22 |
Skywork-Critic-Llama3.1-70B and Skywork-Critic-Llama3.1-8B are built on Meta [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) and [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) respectively. These models have undergone fine-tuning using a diverse array of high-quality datasets, including:
|
23 |
- **Cleaned open-source data**: We utilize a high-quality subset of [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2), [OffsetBias](https://huggingface.co/datasets/NCSOFT/offsetbias), [WildGuard (adversarial)](https://huggingface.co/allenai/wildguard) and Magpie DPO series([Ultra](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1),[Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1),[Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1),[Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)). For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1). Additionally, we integrate several open-source, high-quality critic datasets such as [Open-Critic-GPT](https://huggingface.co/datasets/Vezora/Open-Critic-GPT) into our training process.
|
24 |
-
- **In-house human annotation data**: This includes both pointwise scoring across many dimensions for a single response and pairwise comparisons between two responses. Each dimension incorporates a rationale for the assigned score.
|
25 |
- **Synthetic critic data**: We use a similar appoarch to [**self-taught**](https://arxiv.org/abs/2408.02666). Specifically, we employed two methods to generate inferior responses for a given instruction: 1) Creating a similar instruction and then generating a response for this new instruction. 2) Introducing subtle errors into high-quality responses.
|
26 |
- **Critic-related chat data**: We incorporate critic-related chat data to maintain the model's conversational capabilities.
|
27 |
|
|
|
21 |
|
22 |
Skywork-Critic-Llama3.1-70B and Skywork-Critic-Llama3.1-8B are built on Meta [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) and [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) respectively. These models have undergone fine-tuning using a diverse array of high-quality datasets, including:
|
23 |
- **Cleaned open-source data**: We utilize a high-quality subset of [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2), [OffsetBias](https://huggingface.co/datasets/NCSOFT/offsetbias), [WildGuard (adversarial)](https://huggingface.co/allenai/wildguard) and Magpie DPO series([Ultra](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1),[Pro (Llama-3.1)](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1),[Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1),[Air](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1)). For more details, please refer to our [Skywork-Reward-Preference-80K-v0.1 dataset](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1). Additionally, we integrate several open-source, high-quality critic datasets such as [Open-Critic-GPT](https://huggingface.co/datasets/Vezora/Open-Critic-GPT) into our training process.
|
24 |
+
- **In-house human annotation data**: This includes both pointwise scoring across many dimensions for a single response and pairwise comparisons between two responses. Each dimension incorporates a rationale for the assigned score. Please note that manually labeled data is very expensive to obtain. We only have a few hundred manually labeled data points, all of which are in Chinese, so the ability to perform single rating might not be particularly strong.
|
25 |
- **Synthetic critic data**: We use a similar appoarch to [**self-taught**](https://arxiv.org/abs/2408.02666). Specifically, we employed two methods to generate inferior responses for a given instruction: 1) Creating a similar instruction and then generating a response for this new instruction. 2) Introducing subtle errors into high-quality responses.
|
26 |
- **Critic-related chat data**: We incorporate critic-related chat data to maintain the model's conversational capabilities.
|
27 |
|