Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model.