Update README.md
#1
by
wping
- opened
README.md
CHANGED
@@ -51,7 +51,7 @@ print(outputs[0]["generated_text"][-1])
|
|
51 |
## Model Card
|
52 |
|
53 |
* Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
54 |
-
* Continued Pretraining: 1B tokens on 4M
|
55 |
* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
|
56 |
* Maximum context window: 4M tokens
|
57 |
|
|
|
51 |
## Model Card
|
52 |
|
53 |
* Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
54 |
+
* Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 150 iterations with a sequence length of 4M and a global batch size of 2.
|
55 |
* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
|
56 |
* Maximum context window: 4M tokens
|
57 |
|