Running 110 110 TxT360: Trillion Extracted Text 📖 Create a large, deduplicated dataset for LLM pre-training
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17, 2024 • 37
togethercomputer/m2-bert-80M-32k-retrieval Sentence Similarity • Updated Jan 12, 2024 • 368 • 128
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T Text Generation • Updated Sep 27, 2024 • 25.1k • 173