tf_dataset = model.prepare_tf_dataset(dataset["train"], batch_size=16, shuffle=True, tokenizer=tokenizer) Note that in the code sample above, you need to pass the tokenizer to prepare_tf_dataset so it can correctly pad batches as they're loaded.