File size: 326 Bytes
5fa1a76
 
 
 
 
 
1
2
3
4
5
6
encoding = processor(image, question, return_tensors="pt")
print(encoding.keys())
dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])

Use case 5: visual question answering (inference), apply_ocr=False
For visual question answering tasks (such as DocVQA), you can provide a question to the processor.