Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Process the image with an image processor and place the pixel_values on a GPU:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use GPU if available, otherwise use a CPU
encoding = image_processor(image, return_tensors="pt")
pixel_values = encoding.pixel_values.to(device)
Pass your input to the model and return the logits:
outputs = model(pixel_values=pixel_values)
logits = outputs.logits.cpu()
Next, rescale the logits to the original image size:
upsampled_logits = nn.functional.interpolate(
logits,
size=image.size[::-1],
mode="bilinear",
align_corners=False,
)
pred_seg = upsampled_logits.argmax(dim=1)[0]
Load an image processor to preprocess the image and return the input as TensorFlow tensors:
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("MariaK/scene_segmentation")
inputs = image_processor(image, return_tensors="tf")
Pass your input to the model and return the logits:
from transformers import TFAutoModelForSemanticSegmentation
model = TFAutoModelForSemanticSegmentation.from_pretrained("MariaK/scene_segmentation")
logits = model(**inputs).logits
Next, rescale the logits to the original image size and apply argmax on the class dimension:
logits = tf.transpose(logits, [0, 2, 3, 1])
upsampled_logits = tf.image.resize(
logits,
# We reverse the shape of image because image.size returns width and height.