|
Process the image with an image processor and place the pixel_values on a GPU: |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use GPU if available, otherwise use a CPU |
|
encoding = image_processor(image, return_tensors="pt") |
|
pixel_values = encoding.pixel_values.to(device) |
|
|
|
Pass your input to the model and return the logits: |
|
|
|
outputs = model(pixel_values=pixel_values) |
|
logits = outputs.logits.cpu() |
|
|
|
Next, rescale the logits to the original image size: |
|
|
|
upsampled_logits = nn.functional.interpolate( |
|
logits, |
|
size=image.size[::-1], |
|
mode="bilinear", |
|
align_corners=False, |
|
) |
|
pred_seg = upsampled_logits.argmax(dim=1)[0] |
|
|
|
Load an image processor to preprocess the image and return the input as TensorFlow tensors: |
|
|
|
from transformers import AutoImageProcessor |
|
image_processor = AutoImageProcessor.from_pretrained("MariaK/scene_segmentation") |
|
inputs = image_processor(image, return_tensors="tf") |
|
|
|
Pass your input to the model and return the logits: |
|
|
|
from transformers import TFAutoModelForSemanticSegmentation |
|
model = TFAutoModelForSemanticSegmentation.from_pretrained("MariaK/scene_segmentation") |
|
logits = model(**inputs).logits |
|
|
|
Next, rescale the logits to the original image size and apply argmax on the class dimension: |
|
|
|
logits = tf.transpose(logits, [0, 2, 3, 1]) |
|
upsampled_logits = tf.image.resize( |
|
logits, |
|
# We reverse the shape of image because image.size returns width and height. |