Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs.