Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
602 Bytes
Specify a maximum sample length, and the feature extractor will either pad or truncate the sequences to match it:
def preprocess_function(examples):
audio_arrays = [x["array"] for x in examples["audio"]]
inputs = feature_extractor(
audio_arrays,
sampling_rate=16000,
padding=True,
max_length=100000,
truncation=True,
)
return inputs
Apply the preprocess_function to the first few examples in the dataset:
processed_dataset = preprocess_function(dataset[:5])
The sample lengths are now the same and match the specified maximum length.