Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Image captioning is an example of a multimodal task where the model takes an image as input and outputs a sequence of text describing the image or some properties of the image.