Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

420 Bytes

	result = {
	k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
	for k, t in concatenated_examples.items()
	}
	result["labels"] = result["input_ids"].copy()
	return result

	Apply the group_texts function over the entire dataset:

	lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)

	Now create a batch of examples using [DataCollatorForLanguageModeling].