Crop a random part of the image, resize it, and normalize it with the image mean and standard deviation: from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std) size = ( image_processor.size["shortest_edge"] if "shortest_edge" in image_processor.size else (image_processor.size["height"], image_processor.size["width"]) ) _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize]) Then create a preprocessing function to apply the transforms and return the pixel_values - the inputs to the model - of the image: def transforms(examples): examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]] del examples["image"] return examples To apply the preprocessing function over the entire dataset, use 🤗 Datasets [~datasets.Dataset.with_transform] method.