|
|
|
Data Collator |
|
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of |
|
the same type as the elements of train_dataset or eval_dataset. |
|
To be able to build batches, data collators may apply some processing (like padding). Some of them (like |
|
[DataCollatorForLanguageModeling]) also apply some random data augmentation (like random masking) |
|
on the formed batch. |
|
Examples of use can be found in the example scripts or example notebooks. |
|
Default data collator |
|
[[autodoc]] data.data_collator.default_data_collator |
|
DefaultDataCollator |
|
[[autodoc]] data.data_collator.DefaultDataCollator |
|
DataCollatorWithPadding |
|
[[autodoc]] data.data_collator.DataCollatorWithPadding |
|
DataCollatorForTokenClassification |
|
[[autodoc]] data.data_collator.DataCollatorForTokenClassification |
|
DataCollatorForSeq2Seq |
|
[[autodoc]] data.data_collator.DataCollatorForSeq2Seq |
|
DataCollatorForLanguageModeling |
|
[[autodoc]] data.data_collator.DataCollatorForLanguageModeling |
|
- numpy_mask_tokens |
|
- tf_mask_tokens |
|
- torch_mask_tokens |
|
DataCollatorForWholeWordMask |
|
[[autodoc]] data.data_collator.DataCollatorForWholeWordMask |
|
- numpy_mask_tokens |
|
- tf_mask_tokens |
|
- torch_mask_tokens |
|
DataCollatorForPermutationLanguageModeling |
|
[[autodoc]] data.data_collator.DataCollatorForPermutationLanguageModeling |
|
- numpy_mask_tokens |
|
- tf_mask_tokens |
|
- torch_mask_tokens |