|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- open-r1/reasoning-mix |
|
language: |
|
- en |
|
base_model: |
|
- open-r1/Qwen2.5-Math-7B-RoPE-300k |
|
library_name: transformers |
|
--- |
|
|
|
<img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;" width="300"> |
|
|
|
# OpenR1-Distill-7B |
|
|
|
OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on [Mixture-of-Reasons](https://huggingface.co/datasets/open-r1/Mixture-of-Reasons); a curated dataset of 350k reasoning traces distilled from [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) in the domains of mathematics, coding, and science. This model matches or exceeds the performance of [DeepSeek's 7B distilled model](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) |
|
|
|
## Model description |
|
|
|
- **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. |
|
- **Language(s) (NLP):** Primarily English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** a [variant](https://huggingface.co/open-r1/Qwen2.5-Math-7B-RoPE-300k) of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B), whose RoPE base frequency was extended to 300k to enable training on a context of 32k tokens. |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/huggingface/alignment-handbook |
|
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat |
|
- **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: http://arena.lmsys.org |
|
|
|
## Performance |
|
|
|
At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks: |
|
|
|
| Model | AIME 2024 | MATH-500 | GPQA-D | LiveCodeBench | |
|
| :---- | :----: | :----: | :----: | :----: | |
|
| OpenR1-Distill-7B | 52.66 | 89 | 52.78 | X | |
|
| DeepSeek-R1-Distill-Qwen-7B | 51.25 | 93.45 | 52.4 | 37.41 | |
|
|
|
In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B: |
|
|
|
 |
|
|
|
However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap. |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
The model was initially fine-tuned on a filtered and preprocessed of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. |
|
We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities. |
|
|
|
You can find the datasets used for training Zephyr-7B-β [here](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66) |
|
|
|
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers: |
|
|
|
```python |
|
# Install transformers from source - only needed for versions <= v4.34 |
|
# pip install git+https://github.com/huggingface/transformers.git |
|
# pip install accelerate |
|
|
|
import torch |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto") |
|
|
|
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating |
|
messages = [ |
|
{ |
|
"role": "system", |
|
"content": "You are a friendly chatbot who always responds in the style of a pirate", |
|
}, |
|
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, |
|
] |
|
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
# <|system|> |
|
# You are a friendly chatbot who always responds in the style of a pirate.</s> |
|
# <|user|> |
|
# How many helicopters can a human eat in one sitting?</s> |
|
# <|assistant|> |
|
# Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food! |
|
``` |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). |
|
It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this. |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-07 |
|
- train_batch_size: 2 |
|
- eval_batch_size: 4 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 16 |
|
- total_train_batch_size: 32 |
|
- total_eval_batch_size: 64 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 3.0 |
|
|
|
### Training results |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.0.dev0 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.12.0 |
|
- Tokenizers 0.14.0 |
|
|
|
## Citation |
|
|