When deciding which feature to use as the text input, consider that the SpeechT5 tokenizer doesn't have any tokens for numbers.