It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. |
It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. |