Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ license: other
|
|
8 |
This is a text classification model designed to determine the educational value of a piece of text (score 0-5 from low to high). It is similar to the [FineWeb-Edu classifier](https://arxiv.org/abs/2406.17557) and was trained on the same text samples, but using annotations from Mixtral 8x22B-Instruct. In contrast, the original FineWeb-Edu classifier was trained using annotations from Llama 3 70B-Instruct. The NeMo Curator FineWeb Mixtral Edu classifier was used as part of a classifier ensemble in the creation of the [Nemotron-CC](https://arxiv.org/abs/2412.02595) dataset. The models were finetuned starting from the [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) model.
|
9 |
|
10 |
## License
|
11 |
-
|
12 |
|
13 |
## References
|
14 |
- [The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale](https://arxiv.org/abs/2406.17557)
|
|
|
8 |
This is a text classification model designed to determine the educational value of a piece of text (score 0-5 from low to high). It is similar to the [FineWeb-Edu classifier](https://arxiv.org/abs/2406.17557) and was trained on the same text samples, but using annotations from Mixtral 8x22B-Instruct. In contrast, the original FineWeb-Edu classifier was trained using annotations from Llama 3 70B-Instruct. The NeMo Curator FineWeb Mixtral Edu classifier was used as part of a classifier ensemble in the creation of the [Nemotron-CC](https://arxiv.org/abs/2412.02595) dataset. The models were finetuned starting from the [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) model.
|
9 |
|
10 |
## License
|
11 |
+
GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). Additional Information: [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
|
12 |
|
13 |
## References
|
14 |
- [The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale](https://arxiv.org/abs/2406.17557)
|