cosmo3769
/

finetuned_paligemma_vqav2_small

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

cosmo3769 commited on Oct 15, 2024

Commit

6e8c33b

·

verified ·

1 Parent(s): 06c116f

Update README.md

Files changed (1) hide show

README.md +22 -12

README.md CHANGED Viewed

@@ -9,26 +9,36 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # finetuned_paligemma_vqav2_small
-This model is a fine-tuned version of [google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -46,7 +56,7 @@ The following hyperparameters were used during training:
 ### Training results
 ### Framework versions

   results: []
 ---
 # finetuned_paligemma_vqav2_small
+This model is a fine-tuned version of [google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224) on a small chunk of
+[vqav2 dataset](https://huggingface.co/datasets/merve/vqav2-small) by [Merve](https://huggingface.co/merve).
+## How to Use?
+```python
+import torch
+import requests
+from PIL import Image
+from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
+pretrained_model_id = "google/paligemma-3b-pt-224"
+finetuned_model_id = "pyimagesearch/finetuned_paligemma_vqav2_small"
+processor = AutoProcessor.from_pretrained(pretrained_model_id)
+finetuned_model = PaliGemmaForConditionalGeneration.from_pretrained(finetuned_model_id)
+prompt = "What is behind the cat?"
+image_file = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png?download=true"
+raw_image = Image.open(requests.get(image_file, stream=True).raw)
+inputs = processor(raw_image.convert("RGB"), prompt, return_tensors="pt")
+output = finetuned_model.generate(**inputs, max_new_tokens=20)
+print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
+# gramophone
+```
 ### Training hyperparameters
 ### Training results
+![unnamed.png](https://cdn-uploads.huggingface.co/production/uploads/62818ecf52815a0dc73c6f1e/JvIRYy9_5efTQqo0S8PcB.png)
 ### Framework versions