Extracts the input_values from the audio file and tokenize the transcription column with the processor.