Investment Access Request - G-Operator

G-Operator is available exclusively to qualified investors under NDA. Access is restricted to investment evaluation purposes only.

By requesting access, you acknowledge that this model is proprietary technology subject to NDA restrictions. You agree to use this model solely for investment evaluation purposes and maintain strict confidentiality of all technical details, training methodologies, and performance characteristics. Unauthorized use, reproduction, or distribution is strictly prohibited.

Log in or Sign Up to review the conditions and access this model content.

G-Operator: Android Device Control with Gemma 3N

G-Operator Logo

Multimodal Android Device Control Agent

Model License Model Size Python Transformers

🌟 Overview

G-Operator is a fine-tuned multimodal AI agent based on Google's Gemma 3N-E4B-IT model, specifically designed for Android device control through visual understanding and action generation. The model can analyze Android device screenshots and generate precise JSON actions to control the device.

πŸ” Investment Access Control

This model is proprietary technology available exclusively to qualified investors under NDA restrictions. Access is granted solely for investment evaluation purposes.

πŸ“¦ Available Model Versions

This repository contains multiple versions of the G-Operator model:

🎯 Recommended: Merged Model

  • gemma3n_e4b_it_merged: Complete merged model ready for inference
  • Best for: Production use and direct inference
  • Size: Full model weights (merged LoRA adapters)

πŸ”„ Training Checkpoints

  • checkpoint-5500: Training checkpoint at 5,500 steps
  • checkpoint-6000: Training checkpoint at 6,000 steps
  • checkpoint-6252: Final training checkpoint at 6,252 steps
  • Best for: Resuming training or analysis of training progression

πŸ”§ LoRA Adapter

  • adapter_model.safetensors: LoRA adapter weights
  • Best for: Parameter-efficient fine-tuning or adapter-based inference

πŸš€ Key Features

  • Multimodal Understanding: Processes both text instructions and Android device screenshots
  • JSON Action Generation: Outputs structured JSON actions for device control
  • LoRA Fine-tuning: Efficient parameter-efficient fine-tuning approach
  • Android-Specific Training: Trained on real Android control episodes
  • High Performance: Based on the powerful Gemma 3N architecture

πŸ“‹ Model Details

Property Value
Base Model google/gemma-3n-E4B-it
Architecture Gemma 3N (4B parameters)
Fine-tuning Method LoRA (Low-Rank Adaptation)
LoRA Rank 32
LoRA Alpha 64
Target Modules q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Training Data Android control episodes with screenshots and actions
License Gemma 3N License

πŸ› οΈ Installation

Prerequisites

Before installing the model, you must:

  1. Request Access: Click the "Request Access" button on this page and fill out the form
  2. Wait for Approval: Access requests are typically reviewed within 1-2 business days
  3. Authenticate: Once approved, you'll need to authenticate with Hugging Face

Authentication Required

Important: You must be authenticated with Hugging Face to access this gated model. Ensure you have:

  1. Received access approval
  2. Logged in using huggingface-cli login or login() from huggingface_hub

Basic Usage (Merged Model)

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

# Load merged model and processor
model_id = "Tonic/g-operator"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Prepare input
image = Image.open("android_screenshot.png").convert("RGB")
goal = "Open the Settings app"
instruction = "Navigate to the Settings app on the home screen"

# Build conversation
conversation = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful multimodal assistant specialized in Android device control. You respond with JSON actions to control Android devices."}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": f"Goal: {goal}\nStep: {instruction}\nRespond with a JSON action containing relevant keys (e.g., action_type, x, y, text, app_name, direction)."}
        ]
    }
]

# Generate response
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=128,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

response = processor.tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Using LoRA Adapter

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel

# Load base model
base_model_id = "google/gemma-3n-E4B-it"
model = AutoModelForImageTextToText.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load LoRA adapter
adapter_model_id = "Tonic/g-operator"
model = PeftModel.from_pretrained(model, adapter_model_id)

# Load processor
processor = AutoProcessor.from_pretrained(adapter_model_id, trust_remote_code=True)

# Use the same inference code as above...

Loading Specific Checkpoints

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

# Load specific checkpoint
checkpoint_path = "Tonic/g-operator/checkpoint-6252"  # or checkpoint-6000, checkpoint-5500
model = AutoModelForImageTextToText.from_pretrained(
    checkpoint_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(checkpoint_path, trust_remote_code=True)

# Use the same inference code as above...

Expected Output Format

The model generates JSON actions in the following format:

{
  "action_type": "tap",
  "x": 540,
  "y": 1200,
  "text": "Settings",
  "app_name": "com.android.settings",
  "confidence": 0.95
}

πŸ“Š Training Configuration

Training Parameters

Parameter Value
Learning Rate 3e-4
Batch Size 1 (per device)
Gradient Accumulation 16
Epochs 1.0
Warmup Ratio 0.1
Weight Decay 0.01
Optimizer AdamW
Scheduler Cosine
Mixed Precision bfloat16

Vision Configuration

Parameter Value
Max Image Tokens 256
Min Image Tokens 64
Image Splitting Enabled
Image Format RGB

🎯 Use Cases

1. Automated Testing

  • UI automation for Android apps
  • Regression testing with visual verification
  • Cross-device compatibility testing

2. Accessibility Support

  • Voice-controlled device navigation
  • Assistive technology integration
  • Screen reader enhancement

3. Remote Device Management

  • Remote troubleshooting
  • Device configuration automation
  • Support ticket resolution

4. App Development

  • UI/UX testing automation
  • User flow validation
  • Performance testing

πŸ”’ Safety and Limitations

Safety Considerations

  • Device Control: Model generates actions that can modify device state
  • Testing Environment: Always test in controlled environments first
  • Human Oversight: Implement safety checks for critical operations

Known Limitations

  • Screen Resolution: Performance may vary with different screen sizes
  • App-Specific: Training focused on common Android apps
  • Language: Primarily English language support
  • Real-time: Not optimized for real-time video processing

πŸ“„ License & Terms

This model is proprietary technology owned by Tonic and is subject to strict licensing terms:

Investment Evaluation License

  • Purpose: Access granted solely for investment evaluation and due diligence
  • Restrictions: No commercial use, reproduction, or distribution without written consent
  • NDA Required: All access is subject to Non-Disclosure Agreement
  • Confidentiality: All technical details, training methodologies, and performance characteristics are confidential

Base Model Attribution

  • Gemma 3N-E4B-IT: Licensed under Gemma 3N License from Google
  • Fine-tuning: Proprietary to Tonic, subject to separate licensing terms

πŸ™ Acknowledgments

  • Google: For the base Gemma 3N model
  • Hugging Face: For the transformers library and hosting

πŸ”— Related Links


Made with ❀️ by the Tonic Team

Hugging Face GitHub

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Tonic/g-operator

Finetuned
(38)
this model