Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
dataset["train"].features
Here's what the individual fields represent:
* id: the example's id
* image: a PIL.Image.Image object containing the document image
* query: the question string - natural language asked question, in several languages
* answers: a list of correct answers provided by human annotators
* words and bounding_boxes: the results of OCR, which we will not use here
* answer: an answer matched by a different model which we will not use here
Let's leave only English questions, and drop the answer feature which appears to contain predictions by another model.