Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The authors use the features generated after passing these regions through a pre-trained
CNN like ResNet as visual embeddings.