RAG / knowledge_base /tasks_monocular_depth_estimation.txt
Ahmadzei's picture
update 1
57bdca5
In this guide you'll learn how to:
create a depth estimation pipeline
run depth estimation inference by hand
Before you begin, make sure you have all the necessary libraries installed:
pip install -q transformers
Depth estimation pipeline
The simplest way to try out inference with a model supporting depth estimation is to use the corresponding [pipeline].
Instantiate a pipeline from a checkpoint on the Hugging Face Hub:
from transformers import pipeline
checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline("depth-estimation", model=checkpoint)
Next, choose an image to analyze:
from PIL import Image
import requests
url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image
Pass the image to the pipeline.
predictions = depth_estimator(image)
The pipeline returns a dictionary with two entries. The first one, called predicted_depth, is a tensor with the values
being the depth expressed in meters for each pixel.
The second one, depth, is a PIL image that visualizes the depth estimation result.
Let's take a look at the visualized result:
predictions["depth"]
Depth estimation inference by hand
Now that you've seen how to use the depth estimation pipeline, let's see how we can replicate the same result by hand.
Start by loading the model and associated processor from a checkpoint on the Hugging Face Hub.
Here we'll use the same checkpoint as before:
from transformers import AutoImageProcessor, AutoModelForDepthEstimation
checkpoint = "vinvino02/glpn-nyu"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)
model = AutoModelForDepthEstimation.from_pretrained(checkpoint)
Prepare the image input for the model using the image_processor that will take care of the necessary image transformations
such as resizing and normalization:
pixel_values = image_processor(image, return_tensors="pt").pixel_values
Pass the prepared inputs through the model:
import torch
with torch.no_grad():
outputs = model(pixel_values)
predicted_depth = outputs.predicted_depth
Visualize the results:
import numpy as np
interpolate to original size
prediction = torch.nn.functional.interpolate(
predicted_depth.unsqueeze(1),
size=image.size[::-1],
mode="bicubic",
align_corners=False,
).squeeze()
output = prediction.numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
depth