Update README.md
Browse files
README.md
CHANGED
@@ -9,8 +9,129 @@ base_model:
|
|
9 |
library_name: transformers
|
10 |
---
|
11 |
|
12 |
-
<img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;">
|
13 |
|
14 |
-
#
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
library_name: transformers
|
10 |
---
|
11 |
|
12 |
+
<img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;" width="300">
|
13 |
|
14 |
+
# OpenR1-Distill-7B
|
15 |
|
16 |
+
OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
|
17 |
+
|
18 |
+
## Model description
|
19 |
+
|
20 |
+
- **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
|
21 |
+
- **Language(s) (NLP):** Primarily English
|
22 |
+
- **License:** Apache 2.0
|
23 |
+
- **Finetuned from model:** a [variant](https://huggingface.co/open-r1/Qwen2.5-Math-7B-RoPE-300k) of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B), whose RoPE base frequency was extended to 300k to enable training on a context of 32k tokens.
|
24 |
+
|
25 |
+
### Model Sources
|
26 |
+
|
27 |
+
<!-- Provide the basic links for the model. -->
|
28 |
+
|
29 |
+
- **Repository:** https://github.com/huggingface/alignment-handbook
|
30 |
+
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat
|
31 |
+
- **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: http://arena.lmsys.org
|
32 |
+
|
33 |
+
## Performance
|
34 |
+
|
35 |
+
At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
|
36 |
+
|
37 |
+
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
38 |
+
|-------------|-----|----|---------------|--------------|
|
39 |
+
| StableLM-Tuned-α | 7B| dSFT |2.75| -|
|
40 |
+
| MPT-Chat | 7B |dSFT |5.42| -|
|
41 |
+
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
42 |
+
| Mistral-Instructv0.1 | 7B| - | 6.84 |-|
|
43 |
+
| Zephyr-7b-α |7B| dDPO| 6.88| -|
|
44 |
+
| **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
|
45 |
+
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
46 |
+
| Guanaco | 65B | SFT |6.41| 71.80|
|
47 |
+
| Llama2-Chat | 70B |RLHF |6.86| 92.66|
|
48 |
+
| Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
|
49 |
+
| WizardLM v1.0 | 70B |dSFT |7.71 |-|
|
50 |
+
| Xwin-LM v0.1 | 70B |dPPO |- |95.57|
|
51 |
+
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
52 |
+
| Claude 2 | - |RLHF |8.06| 91.36|
|
53 |
+
| GPT-4 | -| RLHF |8.99| 95.28|
|
54 |
+
|
55 |
+
In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
|
56 |
+
|
57 |
+

|
58 |
+
|
59 |
+
However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap.
|
60 |
+
|
61 |
+
|
62 |
+
## Intended uses & limitations
|
63 |
+
|
64 |
+
The model was initially fine-tuned on a filtered and preprocessed of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
|
65 |
+
We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities.
|
66 |
+
|
67 |
+
You can find the datasets used for training Zephyr-7B-β [here](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66)
|
68 |
+
|
69 |
+
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
70 |
+
|
71 |
+
```python
|
72 |
+
# Install transformers from source - only needed for versions <= v4.34
|
73 |
+
# pip install git+https://github.com/huggingface/transformers.git
|
74 |
+
# pip install accelerate
|
75 |
+
|
76 |
+
import torch
|
77 |
+
from transformers import pipeline
|
78 |
+
|
79 |
+
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
|
80 |
+
|
81 |
+
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
82 |
+
messages = [
|
83 |
+
{
|
84 |
+
"role": "system",
|
85 |
+
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
86 |
+
},
|
87 |
+
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
88 |
+
]
|
89 |
+
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
90 |
+
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
91 |
+
print(outputs[0]["generated_text"])
|
92 |
+
# <|system|>
|
93 |
+
# You are a friendly chatbot who always responds in the style of a pirate.</s>
|
94 |
+
# <|user|>
|
95 |
+
# How many helicopters can a human eat in one sitting?</s>
|
96 |
+
# <|assistant|>
|
97 |
+
# Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
|
98 |
+
```
|
99 |
+
|
100 |
+
## Bias, Risks, and Limitations
|
101 |
+
|
102 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
103 |
+
|
104 |
+
Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
|
105 |
+
It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
|
106 |
+
|
107 |
+
|
108 |
+
## Training and evaluation data
|
109 |
+
|
110 |
+
|
111 |
+
### Training hyperparameters
|
112 |
+
|
113 |
+
The following hyperparameters were used during training:
|
114 |
+
- learning_rate: 5e-07
|
115 |
+
- train_batch_size: 2
|
116 |
+
- eval_batch_size: 4
|
117 |
+
- seed: 42
|
118 |
+
- distributed_type: multi-GPU
|
119 |
+
- num_devices: 16
|
120 |
+
- total_train_batch_size: 32
|
121 |
+
- total_eval_batch_size: 64
|
122 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
123 |
+
- lr_scheduler_type: linear
|
124 |
+
- lr_scheduler_warmup_ratio: 0.1
|
125 |
+
- num_epochs: 3.0
|
126 |
+
|
127 |
+
### Training results
|
128 |
+
|
129 |
+
|
130 |
+
### Framework versions
|
131 |
+
|
132 |
+
- Transformers 4.35.0.dev0
|
133 |
+
- Pytorch 2.0.1+cu118
|
134 |
+
- Datasets 2.12.0
|
135 |
+
- Tokenizers 0.14.0
|
136 |
+
|
137 |
+
## Citation
|