UI-TARS 1.5-7B Model Setup Commands
This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
Prerequisites
1. Verify Ollama Installation
ollama --version
2. Install System Dependencies
# Install sentencepiece via Homebrew
brew install sentencepiece
# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub
Step 1: Download the UI-TARS Model
Create directory and download model
# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
# Verify download
ls -la
Step 2: Setup llama.cpp for Conversion
Clone and build llama.cpp
# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai
# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git
# Navigate to llama.cpp directory
cd llama.cpp
# Create build directory and configure with CMake
mkdir build
cd build
cmake ..
# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)
# Verify the quantize tool was built
ls -la bin/llama-quantize
Step 3: Convert Safetensors to GGUF Format
Create output directory and convert to F16 GGUF
# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
--outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
--outtype f16
# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
Step 4: Quantize to Q4_K_M Format
Quantize the F16 model to reduce size
# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
q4_k_m
# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
Step 5: Create Modelfiles for Ollama
Create Modelfile for F16 version
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
Create Modelfile for quantized version
cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
Step 6: Create Models in Ollama
Create the F16 model (high quality, larger size)
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile
Create the quantized model (recommended for daily use)
ollama create ui-tars:q4 -f Modelfile-q4
Step 7: Verify Installation
List all available models
ollama list
Test the quantized model
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
Test with an image (if you have one)
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
File Sizes and Results
After completion, you should have:
- Original model:
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/
(~15GB, 19 files) - F16 GGUF:
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
(~14.5GB) - Quantized GGUF:
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
(~4.4GB) - Ollama models:
ui-tars:latest
(~15GB in Ollama)ui-tars:q4
(~4.7GB in Ollama) โญ Recommended for daily use
Usage Tips
- Use the quantized model (
ui-tars:q4
) for regular use - it's 69% smaller with minimal quality loss - The model supports vision capabilities - you can send screenshots for UI analysis
- Proper image formats: PNG, JPEG, WebP are supported
- For UI automation: Provide clear screenshots and specific questions about what you want to automate
Cleanup (Optional)
If you want to save disk space after setup:
# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
Total Setup Time: ~20-30 minutes (depending on download and conversion speeds) Final Model Size: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!
- Downloads last month
- 8
Hardware compatibility
Log In
to view the estimation
4-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support