Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /nltk_chunking /model_doc_owlvit /chunk_23.txt

Ahmadzei

added 3 more tables for large emb model

5fa1a76 over 1 year ago

raw

history blame contribute delete

1.55 kB

	thon

	import requests
	from PIL import Image
	import torch
	from transformers import OwlViTProcessor, OwlViTForObjectDetection
	processor = OwlViTProcessor.from_pretrained("google/owlvit-base-patch32")
	model = OwlViTForObjectDetection.from_pretrained("google/owlvit-base-patch32")
	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)
	texts = [["a photo of a cat", "a photo of a dog"]]
	inputs = processor(text=texts, images=image, return_tensors="pt")
	outputs = model(**inputs)
	Target image sizes (height, width) to rescale box predictions [batch_size, 2]
	target_sizes = torch.Tensor([image.size[::-1]])
	Convert outputs (bounding boxes and class logits) to Pascal VOC format (xmin, ymin, xmax, ymax)
	results = processor.post_process_object_detection(outputs=outputs, target_sizes=target_sizes, threshold=0.1)
	i = 0 # Retrieve predictions for the first image for the corresponding text queries
	text = texts[i]
	boxes, scores, labels = results[i]["boxes"], results[i]["scores"], results[i]["labels"]
	for box, score, label in zip(boxes, scores, labels):
	box = [round(i, 2) for i in box.tolist()]
	print(f"Detected {text[label]} with confidence {round(score.item(), 3)} at location {box}")
	Detected a photo of a cat with confidence 0.707 at location [324.97, 20.44, 640.58, 373.29]
	Detected a photo of a cat with confidence 0.717 at location [1.46, 55.26, 315.55, 472.17]

	Resources
	A demo notebook on using OWL-ViT for zero- and one-shot (image-guided) object detection can be found here.