def select_speaker(speaker_id): | |
return 100 <= speaker_counts[speaker_id] <= 400 | |
dataset = dataset.filter(select_speaker, input_columns=["speaker_id"]) | |
Let's check how many speakers remain: | |
len(set(dataset["speaker_id"])) | |
42 | |
Let's see how many examples are left: | |
len(dataset) | |
9973 | |
You are left with just under 10,000 examples from approximately 40 unique speakers, which should be sufficient. |