File size: 197 Bytes
5fa1a76
 
 
1
2
3
It can be
instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing
for the task, similarly to the zero-shot capabilities of GPT-2 and 3.