File size: 579 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
dataset["train"].features

Here's what the individual fields represent:
* id: the example's id
* image: a PIL.Image.Image object containing the document image
* query: the question string - natural language asked question, in several languages
* answers: a list of correct answers provided by human annotators
* words and bounding_boxes: the results of OCR, which we will not use here
* answer: an answer matched by a different model which we will not use here
Let's leave only English questions, and drop the answer feature which appears to contain predictions by another model.