A newer version of the Gradio SDK is available:
5.38.2
metadata
title: UI Screen Description Generator With Pix2Struct
emoji: 🐨
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.28.0
app_file: app.py
pinned: false
license: mit
short_description: Built a vision-language application
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
UI Screen Describer with Pix2Struct
This demo uses Google's pix2struct-screen2words-large
model to turn UI screenshots into natural language descriptions.
Use Cases
- Accessibility
- UI testing
- Auto documentation
How it works
Upload any screenshot (e.g., app, webpage, dashboard) and the model will describe it in text.
Built using Hugging Face Transformers + Gradio.