File size: 458 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
image_processor = processor.image_processor
def get_ocr_words_and_boxes(examples):
     images = [image.convert("RGB") for image in examples["image"]]
     encoded_inputs = image_processor(images)

     examples["image"] = encoded_inputs.pixel_values
     examples["words"] = encoded_inputs.words
     examples["boxes"] = encoded_inputs.boxes
     return examples

To apply this preprocessing to the entire dataset in a fast way, use [~datasets.Dataset.map].