Safetensors
bert
sarahyurick commited on
Commit
d4d141d
·
verified ·
1 Parent(s): bfb3a90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ license: other
8
  This is a text classification model designed to determine the educational value of a piece of text (score 0-5 from low to high). It is similar to the [FineWeb-Edu classifier](https://arxiv.org/abs/2406.17557) and was trained on the same text samples, but using annotations from Mixtral 8x22B-Instruct. In contrast, the original FineWeb-Edu classifier was trained using annotations from Llama 3 70B-Instruct. The NeMo Curator FineWeb Mixtral Edu classifier was used as part of a classifier ensemble in the creation of the [Nemotron-CC](https://arxiv.org/abs/2412.02595) dataset. The models were finetuned starting from the [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) model.
9
 
10
  ## License
11
- This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
12
 
13
  ## References
14
  - [The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale](https://arxiv.org/abs/2406.17557)
 
8
  This is a text classification model designed to determine the educational value of a piece of text (score 0-5 from low to high). It is similar to the [FineWeb-Edu classifier](https://arxiv.org/abs/2406.17557) and was trained on the same text samples, but using annotations from Mixtral 8x22B-Instruct. In contrast, the original FineWeb-Edu classifier was trained using annotations from Llama 3 70B-Instruct. The NeMo Curator FineWeb Mixtral Edu classifier was used as part of a classifier ensemble in the creation of the [Nemotron-CC](https://arxiv.org/abs/2412.02595) dataset. The models were finetuned starting from the [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) model.
9
 
10
  ## License
11
+ GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). Additional Information: [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
12
 
13
  ## References
14
  - [The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale](https://arxiv.org/abs/2406.17557)