Zhangchen Xu
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -8,8 +8,8 @@ tags:
|
|
8 |
- dpo
|
9 |
- generated_from_trainer
|
10 |
datasets:
|
11 |
-
- Magpie-Align/MagpieLM-4B-SFT-v0.1
|
12 |
-
- Magpie-Align/MagpieLM-4B-DPO-v0.1
|
13 |
model-index:
|
14 |
- name: MagpieLM-4B-Chat-v0.1
|
15 |
results: []
|
@@ -27,12 +27,12 @@ model-index:
|
|
27 |
|
28 |
This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
|
29 |
|
30 |
-
We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
|
31 |
|
32 |
-
We first perform SFT using [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-SFT-v0.1).
|
33 |
* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
|
34 |
|
35 |
-
We then perform DPO on the [Magpie-Align/MagpieLM-4B-DPO-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-DPO-v0.1) dataset.
|
36 |
|
37 |
## 🔥 Benchmark Performance
|
38 |
|
@@ -62,7 +62,7 @@ You can then run conversational inference using the Transformers `pipeline` abst
|
|
62 |
import transformers
|
63 |
import torch
|
64 |
|
65 |
-
model_id = "
|
66 |
|
67 |
pipeline = transformers.pipeline(
|
68 |
"text-generation",
|
@@ -107,7 +107,7 @@ load_in_4bit: false
|
|
107 |
strict: false
|
108 |
|
109 |
datasets:
|
110 |
-
- path: Magpie-Align/MagpieLM-4B-SFT-v0.1
|
111 |
type: sharegpt
|
112 |
conversation: llama3
|
113 |
dataset_prepared_path: last_run_prepared
|
@@ -223,7 +223,7 @@ output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
|
|
223 |
run_name: MagpieLM-4B-Chat-v0.1
|
224 |
|
225 |
dataset_mixer:
|
226 |
-
Magpie-Align/MagpieLM-4B-DPO-v0.1: 1.0
|
227 |
dataset_splits:
|
228 |
- train
|
229 |
- test
|
|
|
8 |
- dpo
|
9 |
- generated_from_trainer
|
10 |
datasets:
|
11 |
+
- Magpie-Align/MagpieLM-4B-SFT-Data-v0.1
|
12 |
+
- Magpie-Align/MagpieLM-4B-DPO-Data-v0.1
|
13 |
model-index:
|
14 |
- name: MagpieLM-4B-Chat-v0.1
|
15 |
results: []
|
|
|
27 |
|
28 |
This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
|
29 |
|
30 |
+
We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
|
31 |
|
32 |
+
We first perform SFT using [Magpie-Align/MagpieLM-4B-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-SFT-Data-v0.1).
|
33 |
* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
|
34 |
|
35 |
+
We then perform DPO on the [Magpie-Align/MagpieLM-4B-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-DPO-Data-v0.1) dataset.
|
36 |
|
37 |
## 🔥 Benchmark Performance
|
38 |
|
|
|
62 |
import transformers
|
63 |
import torch
|
64 |
|
65 |
+
model_id = "MagpieLM-4B-Chat-v0.1"
|
66 |
|
67 |
pipeline = transformers.pipeline(
|
68 |
"text-generation",
|
|
|
107 |
strict: false
|
108 |
|
109 |
datasets:
|
110 |
+
- path: Magpie-Align/MagpieLM-4B-SFT-Data-v0.1
|
111 |
type: sharegpt
|
112 |
conversation: llama3
|
113 |
dataset_prepared_path: last_run_prepared
|
|
|
223 |
run_name: MagpieLM-4B-Chat-v0.1
|
224 |
|
225 |
dataset_mixer:
|
226 |
+
Magpie-Align/MagpieLM-4B-DPO-Data-v0.1: 1.0
|
227 |
dataset_splits:
|
228 |
- train
|
229 |
- test
|