Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

739 Bytes

	",
	]
	encoded_inputs = tokenizer(batch_sentences)
	print(encoded_inputs)
	{'input_ids': [[101, 1252, 1184, 1164, 1248, 6462, 136, 102],
	[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
	[101, 1327, 1164, 5450, 23434, 136, 102]],
	'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 0, 0, 0, 0, 0]],
	'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1],
	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
	[1, 1, 1, 1, 1, 1, 1]]}

	Pad
	Sentences aren't always the same length which can be an issue because tensors, the model inputs, need to have a uniform shape.