File size: 124 Bytes
5fa1a76
1
It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task.