Investment Access Request - G-Operator

G-Operator is available exclusively to qualified investors under NDA. Access is restricted to investment evaluation purposes only.

By requesting access, you acknowledge that this model is proprietary technology subject to NDA restrictions. You agree to use this model solely for investment evaluation purposes and maintain strict confidentiality of all technical details, training methodologies, and performance characteristics. Unauthorized use, reproduction, or distribution is strictly prohibited.

G-Operator: Android Device Control with Gemma 3N

Multimodal Android Device Control Agent

🌟 Overview

G-Operator is a fine-tuned multimodal AI agent based on Google's Gemma 3N-E4B-IT model, specifically designed for Android device control through visual understanding and action generation. The model can analyze Android device screenshots and generate precise JSON actions to control the device.

🔐 Investment Access Control

This model is proprietary technology available exclusively to qualified investors under NDA restrictions. Access is granted solely for investment evaluation purposes.

📦 Available Model Versions

This repository contains multiple versions of the G-Operator model:

🎯 Recommended: Merged Model

gemma3n_e4b_it_merged: Complete merged model ready for inference
Best for: Production use and direct inference
Size: Full model weights (merged LoRA adapters)

🔄 Training Checkpoints

checkpoint-5500: Training checkpoint at 5,500 steps
checkpoint-6000: Training checkpoint at 6,000 steps
checkpoint-6252: Final training checkpoint at 6,252 steps
Best for: Resuming training or analysis of training progression

🔧 LoRA Adapter

adapter_model.safetensors: LoRA adapter weights
Best for: Parameter-efficient fine-tuning or adapter-based inference

🚀 Key Features

Multimodal Understanding: Processes both text instructions and Android device screenshots
JSON Action Generation: Outputs structured JSON actions for device control
LoRA Fine-tuning: Efficient parameter-efficient fine-tuning approach
Android-Specific Training: Trained on real Android control episodes
High Performance: Based on the powerful Gemma 3N architecture

📋 Model Details

Property	Value
Base Model	google/gemma-3n-E4B-it
Architecture	Gemma 3N (4B parameters)
Fine-tuning Method	LoRA (Low-Rank Adaptation)
LoRA Rank	32
LoRA Alpha	64
Target Modules	q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Training Data	Android control episodes with screenshots and actions
License	Gemma 3N License

🛠️ Installation

Prerequisites

Before installing the model, you must:

Request Access: Click the "Request Access" button on this page and fill out the form
Wait for Approval: Access requests are typically reviewed within 1-2 business days
Authenticate: Once approved, you'll need to authenticate with Hugging Face

Authentication Required

Important: You must be authenticated with Hugging Face to access this gated model. Ensure you have:

Received access approval
Logged in using huggingface-cli login or login() from huggingface_hub

Basic Usage (Merged Model)

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

# Load merged model and processor
model_id = "Tonic/g-operator"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Prepare input
image = Image.open("android_screenshot.png").convert("RGB")
goal = "Open the Settings app"
instruction = "Navigate to the Settings app on the home screen"

# Build conversation
conversation = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful multimodal assistant specialized in Android device control. You respond with JSON actions to control Android devices."}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": f"Goal: {goal}\nStep: {instruction}\nRespond with a JSON action containing relevant keys (e.g., action_type, x, y, text, app_name, direction)."}
        ]
    }
]

# Generate response
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=128,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

response = processor.tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Using LoRA Adapter

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel

# Load base model
base_model_id = "google/gemma-3n-E4B-it"
model = AutoModelForImageTextToText.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load LoRA adapter
adapter_model_id = "Tonic/g-operator"
model = PeftModel.from_pretrained(model, adapter_model_id)

# Load processor
processor = AutoProcessor.from_pretrained(adapter_model_id, trust_remote_code=True)

# Use the same inference code as above...

Loading Specific Checkpoints

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

# Load specific checkpoint
checkpoint_path = "Tonic/g-operator/checkpoint-6252"  # or checkpoint-6000, checkpoint-5500
model = AutoModelForImageTextToText.from_pretrained(
    checkpoint_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(checkpoint_path, trust_remote_code=True)

# Use the same inference code as above...

Expected Output Format

The model generates JSON actions in the following format:

{
  "action_type": "tap",
  "x": 540,
  "y": 1200,
  "text": "Settings",
  "app_name": "com.android.settings",
  "confidence": 0.95
}

📊 Training Configuration

Training Parameters

Parameter	Value
Learning Rate	3e-4
Batch Size	1 (per device)
Gradient Accumulation	16
Epochs	1.0
Warmup Ratio	0.1
Weight Decay	0.01
Optimizer	AdamW
Scheduler	Cosine
Mixed Precision	bfloat16

Vision Configuration

Parameter	Value
Max Image Tokens	256
Min Image Tokens	64
Image Splitting	Enabled
Image Format	RGB

🎯 Use Cases

1. Automated Testing

UI automation for Android apps
Regression testing with visual verification
Cross-device compatibility testing

2. Accessibility Support

Voice-controlled device navigation
Assistive technology integration
Screen reader enhancement

3. Remote Device Management

Remote troubleshooting
Device configuration automation
Support ticket resolution

4. App Development

UI/UX testing automation
User flow validation
Performance testing

🔒 Safety and Limitations

Safety Considerations

Device Control: Model generates actions that can modify device state
Testing Environment: Always test in controlled environments first
Human Oversight: Implement safety checks for critical operations

Known Limitations

Screen Resolution: Performance may vary with different screen sizes
App-Specific: Training focused on common Android apps
Language: Primarily English language support
Real-time: Not optimized for real-time video processing

📄 License & Terms

This model is proprietary technology owned by Tonic and is subject to strict licensing terms:

Investment Evaluation License

Purpose: Access granted solely for investment evaluation and due diligence
Restrictions: No commercial use, reproduction, or distribution without written consent
NDA Required: All access is subject to Non-Disclosure Agreement
Confidentiality: All technical details, training methodologies, and performance characteristics are confidential

Base Model Attribution

Gemma 3N-E4B-IT: Licensed under Gemma 3N License from Google
Fine-tuning: Proprietary to Tonic, subject to separate licensing terms

🙏 Acknowledgments

Google: For the base Gemma 3N model
Hugging Face: For the transformers library and hosting

🔗 Related Links

Made with ❤️ by the Tonic Team

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Tonic/g-operator

Base model

google/gemma-3n-E4B

Finetuned

google/gemma-3n-E4B-it

Finetuned

(38)

this model