Specify a maximum sample length, and the feature extractor will either pad or truncate the sequences to match it: def preprocess_function(examples): audio_arrays = [x["array"] for x in examples["audio"]] inputs = feature_extractor( audio_arrays, sampling_rate=16000, padding=True, max_length=100000, truncation=True, ) return inputs Apply the preprocess_function to the first few examples in the dataset: processed_dataset = preprocess_function(dataset[:5]) The sample lengths are now the same and match the specified maximum length.