UI-TARS 1.5-7B Model Setup Commands

This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.

Prerequisites

1. Verify Ollama Installation

ollama --version

2. Install System Dependencies

# Install sentencepiece via Homebrew
brew install sentencepiece

# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub

Step 1: Download the UI-TARS Model

Create directory and download model

# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False

# Verify download
ls -la

Step 2: Setup llama.cpp for Conversion

Clone and build llama.cpp

# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai

# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git

# Navigate to llama.cpp directory
cd llama.cpp

# Create build directory and configure with CMake
mkdir build
cd build
cmake ..

# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)

# Verify the quantize tool was built
ls -la bin/llama-quantize

Step 3: Convert Safetensors to GGUF Format

Create output directory and convert to F16 GGUF

# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp

# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
  --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  --outtype f16

# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf

Step 4: Quantize to Q4_K_M Format

Quantize the F16 model to reduce size

# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build

# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
  q4_k_m

# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf

Step 5: Create Modelfiles for Ollama

Create Modelfile for F16 version

cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

Create Modelfile for quantized version

cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

Step 6: Create Models in Ollama

Create the F16 model (high quality, larger size)

cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile

Create the quantized model (recommended for daily use)

ollama create ui-tars:q4 -f Modelfile-q4

Step 7: Verify Installation

List all available models

ollama list

Test the quantized model

ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"

Test with an image (if you have one)

ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png

File Sizes and Results

After completion, you should have:

Original model: /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/ (~15GB, 19 files)
F16 GGUF: /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf (~14.5GB)
Quantized GGUF: /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf (~4.4GB)
Ollama models:
- ui-tars:latest (~15GB in Ollama)
- ui-tars:q4 (~4.7GB in Ollama) ⭐ Recommended for daily use

Usage Tips

Use the quantized model (ui-tars:q4) for regular use - it's 69% smaller with minimal quality loss
The model supports vision capabilities - you can send screenshots for UI analysis
Proper image formats: PNG, JPEG, WebP are supported
For UI automation: Provide clear screenshots and specific questions about what you want to automate

Cleanup (Optional)

If you want to save disk space after setup:

# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf

# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp

Total Setup Time: ~20-30 minutes (depending on download and conversion speeds) Final Model Size: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!