Vision Encoder Decoder Models
Overview
The [VisionEncoderDecoderModel] can be used to initialize an image-to-text model with any
pretrained Transformer-based vision model as the encoder (e.g.