Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

787 Bytes

	from datasets import load_dataset, Audio
	minds = load_dataset("PolyAI/minds14", name="en-US", split="train[:100]")

	Split the dataset's train split into a train and test set with the [~Dataset.train_test_split] method:

	minds = minds.train_test_split(test_size=0.2)

	Then take a look at the dataset:

	minds
	DatasetDict({
	train: Dataset({
	features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
	num_rows: 16
	})
	test: Dataset({
	features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
	num_rows: 4
	})
	})

	While the dataset contains a lot of useful information, like lang_id and english_transcription, you'll focus on the audio and transcription in this guide.