Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
def collate_fn(batch):
pixel_values = [item["pixel_values"] for item in batch]
encoding = image_processor.pad(pixel_values, return_tensors="pt")
labels = [item["labels"] for item in batch]
batch = {}
batch["pixel_values"] = encoding["pixel_values"]
batch["pixel_mask"] = encoding["pixel_mask"]
batch["labels"] = labels
return batch
Multimodal
For tasks involving multimodal inputs, you'll need a processor to prepare your dataset for the model.