Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /nltk_chunking /main_classes_pipelines /chunk_22.txt

Ahmadzei

added 3 more tables for large emb model

5fa1a76 over 1 year ago

raw

history blame contribute delete

1.97 kB

	Example where it's mostly a speedup:

	thon
	from transformers import pipeline
	from torch.utils.data import Dataset
	from tqdm.auto import tqdm
	pipe = pipeline("text-classification", device=0)
	class MyDataset(Dataset):
	def len(self):
	return 5000
	def __getitem__(self, i):
	return "This is a test"

	dataset = MyDataset()
	for batch_size in [1, 8, 64, 256]:
	print("-" * 30)
	print(f"Streaming batch_size={batch_size}")
	for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
	pass

	On GTX 970

	Streaming no batching
	100%\|██████████████████████████████████████████████████████████████████████\| 5000/5000 [00:26<00:00, 187.52it/s]

	Streaming batch_size=8
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:04<00:00, 1205.95it/s]

	Streaming batch_size=64
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:02<00:00, 2478.24it/s]

	Streaming batch_size=256
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:01<00:00, 2554.43it/s]
	(diminishing returns, saturated the GPU)

	Example where it's most a slowdown:
	thon
	class MyDataset(Dataset):
	def len(self):
	return 5000
	def __getitem__(self, i):
	if i % 64 == 0:
	n = 100
	else:
	n = 1
	return "This is a test" * n

	This is a occasional very long sentence compared to the other.