File size: 602 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Specify a maximum sample length, and the feature extractor will either pad or truncate the sequences to match it:

def preprocess_function(examples):
     audio_arrays = [x["array"] for x in examples["audio"]]
     inputs = feature_extractor(
         audio_arrays,
         sampling_rate=16000,
         padding=True,
         max_length=100000,
         truncation=True,
     )
     return inputs

Apply the preprocess_function to the first few examples in the dataset:

processed_dataset = preprocess_function(dataset[:5])

The sample lengths are now the same and match the specified maximum length.