Example where it's mostly a speedup: thon from transformers import pipeline from torch.utils.data import Dataset from tqdm.auto import tqdm pipe = pipeline("text-classification", device=0) class MyDataset(Dataset): def len(self): return 5000 def __getitem__(self, i): return "This is a test" dataset = MyDataset() for batch_size in [1, 8, 64, 256]: print("-" * 30) print(f"Streaming batch_size={batch_size}") for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): pass On GTX 970 Streaming no batching 100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s] Streaming batch_size=8 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s] Streaming batch_size=64 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s] Streaming batch_size=256 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] (diminishing returns, saturated the GPU) Example where it's most a slowdown: thon class MyDataset(Dataset): def len(self): return 5000 def __getitem__(self, i): if i % 64 == 0: n = 100 else: n = 1 return "This is a test" * n This is a occasional very long sentence compared to the other.