Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
audio_input = [dataset[0]["audio"]["array"]]
feature_extractor(audio_input, sampling_rate=16000)
{'input_values': [array([ 3.8106556e-04, 2.7506407e-03, 2.8015103e-03, ,
5.6335266e-04, 4.6588284e-06, -1.7142107e-04], dtype=float32)]}
Just like the tokenizer, you can apply padding or truncation to handle variable sequences in a batch.