🤖 My Attempt at ARC-AGI-3

Just my shot at tackling the ARC-AGI-3 Interactive Reasoning Benchmark. Spoiler alert: it's pretty hard! 😅

What This Is

This is a neural agent I built to try and solve the ls20 pattern in ARC-AGI-3. The challenge is to convert 15s to 3s in specific grid positions - sounds simple, but the AI has to figure out the pattern and strategy on its own.

The Good News:

The agent learned to recognize the 15→3 conversion pattern
It got decent at picking the right actions (ACTION3 works 92.7% of the time)
No more infinite reset loops (that was annoying)
Actually understands early vs late game strategy

The Reality Check:

Still struggling to complete full levels consistently
No fancy video demos (maybe next time?)
Evaluation results are... let's just say "work in progress" 📊❓
It's an honest attempt, not a breakthrough

Quick Usage

from agents.neural_rl_agent import NeuralRLAgent
import torch

# Load the agent
agent = NeuralRLAgent(card_id="your_card", game_id="ls20")
checkpoint = torch.load("goal_neural_agent_best.pt")
agent.load_state_dict(checkpoint['model_state_dict'])

# Try it out (your mileage may vary)
action = agent.act(current_state)

What I Learned

Pattern recognition is tough for RL agents
Reward shaping matters A LOT
Sometimes the AI finds patterns you didn't expect
ARC-AGI-3 is genuinely challenging (respect to the creators)

Files Included

Models: A few different training checkpoints to try
Code: The neural architecture and training scripts
Analysis: Some pattern analysis I did to understand what works
Config: Training setup and hyperparameters

Missing Stuff

✅ Evaluation results: comprehensive_evaluation_results.json
✅ Demo visualization: agent_demo_visualization.png
✅ Performance plots: Action analysis, learning progression, pattern recognition
❌ Video demonstrations (on the todo list)
❌ Comparison with other approaches

This is more of a "here's what I tried" than a "here's the solution." But hey, that's how research works sometimes! 🤷‍♂️

📊 Evaluation Results

Quick Stats:

23.4% overall success rate (not bad for ARC-AGI-3!)
8.7% completion rate (still working on this)
ACTION3 is the MVP with 92.7% effectiveness
135 successful conversions out of 407 attempts
170x improvement over random baseline (biggest win!)

What Actually Works:

Early game: Spam ACTION3 (it's surprisingly effective)
Mid game: Mix ACTION3 and ACTION1 strategically
Late game: Focus on ACTION1 for cleanup
Avoid ACTION0, ACTION5 completely (learned the hard way)

The agent in action: converting 15s to 3s in the ls20 pattern

Action analysis showing why ACTION3 is the clear winner

Want to Improve It?

Feel free to:

Add proper evaluation metrics
Create video demos of the agent in action
Compare with other ARC-AGI-3 approaches
Fix whatever I probably broke

Built with PyTorch, caffeine, and stubborn determination to make an AI that can count to 3. ☕