Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /tasks_monocular_depth_estimation.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

2.56 kB



	In this guide you'll learn how to:

	create a depth estimation pipeline
	run depth estimation inference by hand

	Before you begin, make sure you have all the necessary libraries installed:

	pip install -q transformers
	Depth estimation pipeline
	The simplest way to try out inference with a model supporting depth estimation is to use the corresponding [pipeline].
	Instantiate a pipeline from a checkpoint on the Hugging Face Hub:

	from transformers import pipeline
	checkpoint = "vinvino02/glpn-nyu"
	depth_estimator = pipeline("depth-estimation", model=checkpoint)

	Next, choose an image to analyze:

	from PIL import Image
	import requests
	url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
	image = Image.open(requests.get(url, stream=True).raw)
	image

	Pass the image to the pipeline.

	predictions = depth_estimator(image)

	The pipeline returns a dictionary with two entries. The first one, called predicted_depth, is a tensor with the values
	being the depth expressed in meters for each pixel.
	The second one, depth, is a PIL image that visualizes the depth estimation result.
	Let's take a look at the visualized result:

	predictions["depth"]

	Depth estimation inference by hand
	Now that you've seen how to use the depth estimation pipeline, let's see how we can replicate the same result by hand.
	Start by loading the model and associated processor from a checkpoint on the Hugging Face Hub.
	Here we'll use the same checkpoint as before:

	from transformers import AutoImageProcessor, AutoModelForDepthEstimation
	checkpoint = "vinvino02/glpn-nyu"
	image_processor = AutoImageProcessor.from_pretrained(checkpoint)
	model = AutoModelForDepthEstimation.from_pretrained(checkpoint)

	Prepare the image input for the model using the image_processor that will take care of the necessary image transformations
	such as resizing and normalization:

	pixel_values = image_processor(image, return_tensors="pt").pixel_values

	Pass the prepared inputs through the model:

	import torch
	with torch.no_grad():
	outputs = model(pixel_values)
	predicted_depth = outputs.predicted_depth

	Visualize the results:

	import numpy as np
	interpolate to original size
	prediction = torch.nn.functional.interpolate(
	predicted_depth.unsqueeze(1),
	size=image.size[::-1],
	mode="bicubic",
	align_corners=False,
	).squeeze()
	output = prediction.numpy()
	formatted = (output * 255 / np.max(output)).astype("uint8")
	depth = Image.fromarray(formatted)
	depth