Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task.