File size: 1,262 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Preprocess
The next step is to load a Wav2Vec2 processor to process the audio signal:

from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base")

The MInDS-14 dataset has a sampling rate of 8000kHz (you can find this information in its dataset card), which means you'll need to resample the dataset to 16000kHz to use the pretrained Wav2Vec2 model:

minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
minds["train"][0]
{'audio': {'array': array([-2.38064706e-04, -1.58618059e-04, -5.43987835e-06, ,
          2.78103951e-04,  2.38446111e-04,  1.18740834e-04], dtype=float32),
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
  'sampling_rate': 16000},
 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
 'transcription': "hi I'm trying to use the banking app on my phone and currently my checking and savings account balance is not refreshing"}

As you can see in the transcription above, the text contains a mix of upper and lowercase characters.