File size: 463 Bytes
5fa1a76
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
Before you can use [~TFPreTrainedModel.prepare_tf_dataset], you will need to add the tokenizer outputs to your dataset as columns, as shown in
the following code sample:

def tokenize_dataset(data):
    # Keys of the returned dictionary will be added to the dataset as columns
    return tokenizer(data["text"])
dataset = dataset.map(tokenize_dataset)

Remember that Hugging Face datasets are stored on disk by default, so this will not inflate your memory usage!