Text Generation
Transformers
Safetensors
llama
alignment-handbook
trl
dpo
Generated from Trainer
conversational
text-generation-inference
Zhangchen Xu commited on
Commit
2ce10db
·
verified ·
1 Parent(s): c319437

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -8,8 +8,8 @@ tags:
8
  - dpo
9
  - generated_from_trainer
10
  datasets:
11
- - Magpie-Align/MagpieLM-4B-SFT-v0.1
12
- - Magpie-Align/MagpieLM-4B-DPO-v0.1
13
  model-index:
14
  - name: MagpieLM-4B-Chat-v0.1
15
  results: []
@@ -27,12 +27,12 @@ model-index:
27
 
28
  This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
29
 
30
- We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. The detailed synthetic dataset generation pipeline will be available to public soon. Before that, feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
31
 
32
- We first perform SFT using [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-SFT-v0.1).
33
  * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
34
 
35
- We then perform DPO on the [Magpie-Align/MagpieLM-4B-DPO-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-DPO-v0.1) dataset.
36
 
37
  ## 🔥 Benchmark Performance
38
 
@@ -62,7 +62,7 @@ You can then run conversational inference using the Transformers `pipeline` abst
62
  import transformers
63
  import torch
64
 
65
- model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
66
 
67
  pipeline = transformers.pipeline(
68
  "text-generation",
@@ -107,7 +107,7 @@ load_in_4bit: false
107
  strict: false
108
 
109
  datasets:
110
- - path: Magpie-Align/MagpieLM-4B-SFT-v0.1
111
  type: sharegpt
112
  conversation: llama3
113
  dataset_prepared_path: last_run_prepared
@@ -223,7 +223,7 @@ output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
223
  run_name: MagpieLM-4B-Chat-v0.1
224
 
225
  dataset_mixer:
226
- Magpie-Align/MagpieLM-4B-DPO-v0.1: 1.0
227
  dataset_splits:
228
  - train
229
  - test
 
8
  - dpo
9
  - generated_from_trainer
10
  datasets:
11
+ - Magpie-Align/MagpieLM-4B-SFT-Data-v0.1
12
+ - Magpie-Align/MagpieLM-4B-DPO-Data-v0.1
13
  model-index:
14
  - name: MagpieLM-4B-Chat-v0.1
15
  results: []
 
27
 
28
  This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
29
 
30
+ We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
31
 
32
+ We first perform SFT using [Magpie-Align/MagpieLM-4B-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-SFT-Data-v0.1).
33
  * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
34
 
35
+ We then perform DPO on the [Magpie-Align/MagpieLM-4B-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-4B-DPO-Data-v0.1) dataset.
36
 
37
  ## 🔥 Benchmark Performance
38
 
 
62
  import transformers
63
  import torch
64
 
65
+ model_id = "MagpieLM-4B-Chat-v0.1"
66
 
67
  pipeline = transformers.pipeline(
68
  "text-generation",
 
107
  strict: false
108
 
109
  datasets:
110
+ - path: Magpie-Align/MagpieLM-4B-SFT-Data-v0.1
111
  type: sharegpt
112
  conversation: llama3
113
  dataset_prepared_path: last_run_prepared
 
223
  run_name: MagpieLM-4B-Chat-v0.1
224
 
225
  dataset_mixer:
226
+ Magpie-Align/MagpieLM-4B-DPO-Data-v0.1: 1.0
227
  dataset_splits:
228
  - train
229
  - test