lewtun HF Staff commited on
Commit
1050437
·
verified ·
1 Parent(s): 7dea0a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -9,8 +9,129 @@ base_model:
9
  library_name: transformers
10
  ---
11
 
12
- <img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;">
13
 
14
- # R1-Distill-7B
15
 
16
- R1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/open-r1/Qwen2.5-Math-7B-RoPE-300k)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  library_name: transformers
10
  ---
11
 
12
+ <img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;" width="300">
13
 
14
+ # OpenR1-Distill-7B
15
 
16
+ OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
17
+
18
+ ## Model description
19
+
20
+ - **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
21
+ - **Language(s) (NLP):** Primarily English
22
+ - **License:** Apache 2.0
23
+ - **Finetuned from model:** a [variant](https://huggingface.co/open-r1/Qwen2.5-Math-7B-RoPE-300k) of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B), whose RoPE base frequency was extended to 300k to enable training on a context of 32k tokens.
24
+
25
+ ### Model Sources
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Repository:** https://github.com/huggingface/alignment-handbook
30
+ - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat
31
+ - **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: http://arena.lmsys.org
32
+
33
+ ## Performance
34
+
35
+ At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
36
+
37
+ | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
38
+ |-------------|-----|----|---------------|--------------|
39
+ | StableLM-Tuned-α | 7B| dSFT |2.75| -|
40
+ | MPT-Chat | 7B |dSFT |5.42| -|
41
+ | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
42
+ | Mistral-Instructv0.1 | 7B| - | 6.84 |-|
43
+ | Zephyr-7b-α |7B| dDPO| 6.88| -|
44
+ | **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
45
+ | Falcon-Instruct | 40B |dSFT |5.17 |45.71|
46
+ | Guanaco | 65B | SFT |6.41| 71.80|
47
+ | Llama2-Chat | 70B |RLHF |6.86| 92.66|
48
+ | Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
49
+ | WizardLM v1.0 | 70B |dSFT |7.71 |-|
50
+ | Xwin-LM v0.1 | 70B |dPPO |- |95.57|
51
+ | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
52
+ | Claude 2 | - |RLHF |8.06| 91.36|
53
+ | GPT-4 | -| RLHF |8.99| 95.28|
54
+
55
+ In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
56
+
57
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/raxvt5ma16d7T23my34WC.png)
58
+
59
+ However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap.
60
+
61
+
62
+ ## Intended uses & limitations
63
+
64
+ The model was initially fine-tuned on a filtered and preprocessed of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
65
+ We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities.
66
+
67
+ You can find the datasets used for training Zephyr-7B-β [here](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66)
68
+
69
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
70
+
71
+ ```python
72
+ # Install transformers from source - only needed for versions <= v4.34
73
+ # pip install git+https://github.com/huggingface/transformers.git
74
+ # pip install accelerate
75
+
76
+ import torch
77
+ from transformers import pipeline
78
+
79
+ pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
80
+
81
+ # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
82
+ messages = [
83
+ {
84
+ "role": "system",
85
+ "content": "You are a friendly chatbot who always responds in the style of a pirate",
86
+ },
87
+ {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
88
+ ]
89
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
90
+ outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
91
+ print(outputs[0]["generated_text"])
92
+ # <|system|>
93
+ # You are a friendly chatbot who always responds in the style of a pirate.</s>
94
+ # <|user|>
95
+ # How many helicopters can a human eat in one sitting?</s>
96
+ # <|assistant|>
97
+ # Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
98
+ ```
99
+
100
+ ## Bias, Risks, and Limitations
101
+
102
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
103
+
104
+ Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
105
+ It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
106
+
107
+
108
+ ## Training and evaluation data
109
+
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 5e-07
115
+ - train_batch_size: 2
116
+ - eval_batch_size: 4
117
+ - seed: 42
118
+ - distributed_type: multi-GPU
119
+ - num_devices: 16
120
+ - total_train_batch_size: 32
121
+ - total_eval_batch_size: 64
122
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
123
+ - lr_scheduler_type: linear
124
+ - lr_scheduler_warmup_ratio: 0.1
125
+ - num_epochs: 3.0
126
+
127
+ ### Training results
128
+
129
+
130
+ ### Framework versions
131
+
132
+ - Transformers 4.35.0.dev0
133
+ - Pytorch 2.0.1+cu118
134
+ - Datasets 2.12.0
135
+ - Tokenizers 0.14.0
136
+
137
+ ## Citation