File size: 1,108 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
", return_tensors="pt") this is a custom function that returns the visual embeddings given the image path visual_embeds = get_visual_embeddings(image_path) visual_token_type_ids = torch.ones(visual_embeds.shape[:-1], dtype=torch.long) visual_attention_mask = torch.ones(visual_embeds.shape[:-1], dtype=torch.float) inputs.update( { "visual_embeds": visual_embeds, "visual_token_type_ids": visual_token_type_ids, "visual_attention_mask": visual_attention_mask, } ) outputs = model(**inputs) last_hidden_state = outputs.last_hidden_state VisualBertConfig [[autodoc]] VisualBertConfig VisualBertModel [[autodoc]] VisualBertModel - forward VisualBertForPreTraining [[autodoc]] VisualBertForPreTraining - forward VisualBertForQuestionAnswering [[autodoc]] VisualBertForQuestionAnswering - forward VisualBertForMultipleChoice [[autodoc]] VisualBertForMultipleChoice - forward VisualBertForVisualReasoning [[autodoc]] VisualBertForVisualReasoning - forward VisualBertForRegionToPhraseAlignment [[autodoc]] VisualBertForRegionToPhraseAlignment - forward |