RAG / knowledge_base /main_classes_data_collator.txt
Ahmadzei's picture
update 1
57bdca5
Data Collator
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of train_dataset or eval_dataset.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
[DataCollatorForLanguageModeling]) also apply some random data augmentation (like random masking)
on the formed batch.
Examples of use can be found in the example scripts or example notebooks.
Default data collator
[[autodoc]] data.data_collator.default_data_collator
DefaultDataCollator
[[autodoc]] data.data_collator.DefaultDataCollator
DataCollatorWithPadding
[[autodoc]] data.data_collator.DataCollatorWithPadding
DataCollatorForTokenClassification
[[autodoc]] data.data_collator.DataCollatorForTokenClassification
DataCollatorForSeq2Seq
[[autodoc]] data.data_collator.DataCollatorForSeq2Seq
DataCollatorForLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForLanguageModeling
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens
DataCollatorForWholeWordMask
[[autodoc]] data.data_collator.DataCollatorForWholeWordMask
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens
DataCollatorForPermutationLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForPermutationLanguageModeling
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens