Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

463 Bytes

	Here is how you can create a function to realign the tokens and labels, and truncate sequences to be no longer than DistilBERT's maximum input length:

	def tokenize_and_align_labels(examples):
	tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)

	labels = []
	for i, label in enumerate(examples[f"ner_tags"]):
	word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word.