Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
[OwlViTImageProcessor] can be used to resize (or rescale) and normalize images for the model and [CLIPTokenizer] is used to encode the text.