nvidia
/

Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions Community

Update README.md

#1

by wping - opened 20 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ print(outputs[0]["generated_text"][-1])
 ## Model Card
 * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
-* Continued Pretraining: 1B tokens on 4M Per-source upsampled Pretraining data.
 * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
 * Maximum context window: 4M tokens

 ## Model Card
 * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
+* Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 150 iterations with a sequence length of 4M and a global batch size of 2.
 * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
 * Maximum context window: 4M tokens