Investment Access Request - G-Operator
G-Operator is available exclusively to qualified investors under NDA. Access is restricted to investment evaluation purposes only.
By requesting access, you acknowledge that this model is proprietary technology subject to NDA restrictions. You agree to use this model solely for investment evaluation purposes and maintain strict confidentiality of all technical details, training methodologies, and performance characteristics. Unauthorized use, reproduction, or distribution is strictly prohibited.
Log in or Sign Up to review the conditions and access this model content.
G-Operator: Android Device Control with Gemma 3N
π Overview
G-Operator is a fine-tuned multimodal AI agent based on Google's Gemma 3N-E4B-IT model, specifically designed for Android device control through visual understanding and action generation. The model can analyze Android device screenshots and generate precise JSON actions to control the device.
π Investment Access Control
This model is proprietary technology available exclusively to qualified investors under NDA restrictions. Access is granted solely for investment evaluation purposes.
π¦ Available Model Versions
This repository contains multiple versions of the G-Operator model:
π― Recommended: Merged Model
gemma3n_e4b_it_merged
: Complete merged model ready for inference- Best for: Production use and direct inference
- Size: Full model weights (merged LoRA adapters)
π Training Checkpoints
checkpoint-5500
: Training checkpoint at 5,500 stepscheckpoint-6000
: Training checkpoint at 6,000 stepscheckpoint-6252
: Final training checkpoint at 6,252 steps- Best for: Resuming training or analysis of training progression
π§ LoRA Adapter
adapter_model.safetensors
: LoRA adapter weights- Best for: Parameter-efficient fine-tuning or adapter-based inference
π Key Features
- Multimodal Understanding: Processes both text instructions and Android device screenshots
- JSON Action Generation: Outputs structured JSON actions for device control
- LoRA Fine-tuning: Efficient parameter-efficient fine-tuning approach
- Android-Specific Training: Trained on real Android control episodes
- High Performance: Based on the powerful Gemma 3N architecture
π Model Details
Property | Value |
---|---|
Base Model | google/gemma-3n-E4B-it |
Architecture | Gemma 3N (4B parameters) |
Fine-tuning Method | LoRA (Low-Rank Adaptation) |
LoRA Rank | 32 |
LoRA Alpha | 64 |
Target Modules | q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
Training Data | Android control episodes with screenshots and actions |
License | Gemma 3N License |
π οΈ Installation
Prerequisites
Before installing the model, you must:
- Request Access: Click the "Request Access" button on this page and fill out the form
- Wait for Approval: Access requests are typically reviewed within 1-2 business days
- Authenticate: Once approved, you'll need to authenticate with Hugging Face
Authentication Required
Important: You must be authenticated with Hugging Face to access this gated model. Ensure you have:
- Received access approval
- Logged in using
huggingface-cli login
orlogin()
fromhuggingface_hub
Basic Usage (Merged Model)
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
# Load merged model and processor
model_id = "Tonic/g-operator"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
# Prepare input
image = Image.open("android_screenshot.png").convert("RGB")
goal = "Open the Settings app"
instruction = "Navigate to the Settings app on the home screen"
# Build conversation
conversation = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful multimodal assistant specialized in Android device control. You respond with JSON actions to control Android devices."}
]
},
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": f"Goal: {goal}\nStep: {instruction}\nRespond with a JSON action containing relevant keys (e.g., action_type, x, y, text, app_name, direction)."}
]
}
]
# Generate response
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=128,
do_sample=True,
temperature=0.7,
top_p=0.9
)
response = processor.tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
Using LoRA Adapter
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
# Load base model
base_model_id = "google/gemma-3n-E4B-it"
model = AutoModelForImageTextToText.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
# Load LoRA adapter
adapter_model_id = "Tonic/g-operator"
model = PeftModel.from_pretrained(model, adapter_model_id)
# Load processor
processor = AutoProcessor.from_pretrained(adapter_model_id, trust_remote_code=True)
# Use the same inference code as above...
Loading Specific Checkpoints
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
# Load specific checkpoint
checkpoint_path = "Tonic/g-operator/checkpoint-6252" # or checkpoint-6000, checkpoint-5500
model = AutoModelForImageTextToText.from_pretrained(
checkpoint_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(checkpoint_path, trust_remote_code=True)
# Use the same inference code as above...
Expected Output Format
The model generates JSON actions in the following format:
{
"action_type": "tap",
"x": 540,
"y": 1200,
"text": "Settings",
"app_name": "com.android.settings",
"confidence": 0.95
}
π Training Configuration
Training Parameters
Parameter | Value |
---|---|
Learning Rate | 3e-4 |
Batch Size | 1 (per device) |
Gradient Accumulation | 16 |
Epochs | 1.0 |
Warmup Ratio | 0.1 |
Weight Decay | 0.01 |
Optimizer | AdamW |
Scheduler | Cosine |
Mixed Precision | bfloat16 |
Vision Configuration
Parameter | Value |
---|---|
Max Image Tokens | 256 |
Min Image Tokens | 64 |
Image Splitting | Enabled |
Image Format | RGB |
π― Use Cases
1. Automated Testing
- UI automation for Android apps
- Regression testing with visual verification
- Cross-device compatibility testing
2. Accessibility Support
- Voice-controlled device navigation
- Assistive technology integration
- Screen reader enhancement
3. Remote Device Management
- Remote troubleshooting
- Device configuration automation
- Support ticket resolution
4. App Development
- UI/UX testing automation
- User flow validation
- Performance testing
π Safety and Limitations
Safety Considerations
- Device Control: Model generates actions that can modify device state
- Testing Environment: Always test in controlled environments first
- Human Oversight: Implement safety checks for critical operations
Known Limitations
- Screen Resolution: Performance may vary with different screen sizes
- App-Specific: Training focused on common Android apps
- Language: Primarily English language support
- Real-time: Not optimized for real-time video processing
π License & Terms
This model is proprietary technology owned by Tonic and is subject to strict licensing terms:
Investment Evaluation License
- Purpose: Access granted solely for investment evaluation and due diligence
- Restrictions: No commercial use, reproduction, or distribution without written consent
- NDA Required: All access is subject to Non-Disclosure Agreement
- Confidentiality: All technical details, training methodologies, and performance characteristics are confidential
Base Model Attribution
- Gemma 3N-E4B-IT: Licensed under Gemma 3N License from Google
- Fine-tuning: Proprietary to Tonic, subject to separate licensing terms
π Acknowledgments
- Google: For the base Gemma 3N model
- Hugging Face: For the transformers library and hosting