I Built My First AI, and You Can Too! A Complete Zero-to-One Guide
Artificial Intelligence (AI) once seemed like a "black box" to me, a field reserved for elite researchers. But I recently discovered that its core principles are surprisingly accessible. What if I told you that you could build a functioning conversational AI, from scratch, right now?
In this comprehensive guide, I'll walk you through every single step of my journey. This post contains everything you need: a step-by-step environment setup guide for Ubuntu, the full, unabridged source code with detailed explanations, and even a real-world bug we'll solve together. By the end, you won't just have run some code; you'll understand why it works.
Our goal is simple but powerful:
Build an AI model that learns to reply "hello to you too" when we say "hello".
Let's begin.
Prerequisite: Setting Up Your Development Environment (on Ubuntu)
Before we can build our AI, we need to prepare our workshop. This guide will walk you through setting up a clean, isolated Python environment. This is a best practice that prevents conflicts with other projects.
Part A: CPU-Only Installation (Recommended for Everyone)
This setup will work on any machine, regardless of whether you have a dedicated graphics card (GPU).
Step 1: Install Python and Essential Tools
Open your terminal and run the following commands to make sure you have Python, its package manager (pip
), and its virtual environment tool (venv
).
# First, update your package list
sudo apt update
# Then, install pip and venv for Python 3
sudo apt install python3-pip python3-venv -y
Step 2: Create a Project Directory and Virtual Environment Let's create a dedicated folder for our project.
# Create a folder for your project
mkdir my_ai_project
cd my_ai_project
# Create a virtual environment named 'ai_env'
python3 -m venv ai_env
Step 3: Activate the Virtual Environment You must activate the environment every time you work on this project.
source ai_env/bin/activate
You'll know it's active because your terminal prompt will change to show (ai_env)
, like this:
(ai_env) user@machine:~/my_ai_project$
Step 4: Install PyTorch With your environment active, install PyTorch using the official recommended command for a CPU-only setup.
pip install torch torchvision torchaudio
Part B: GPU Installation (For NVIDIA Users)
If you have an NVIDIA GPU and have already installed the appropriate drivers, you can install a version of PyTorch that will use your GPU for much faster training.
Follow Steps 1-3 from Part A. When you get to Step 4, use this command instead:
# Make sure your NVIDIA drivers are installed first!
# This command installs PyTorch with CUDA 12.1 support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Verify Your Installation
You can quickly check that everything is installed correctly by running this one-liner in your activated terminal:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
If you installed the CPU version, it will show CUDA available: False
. If you installed the GPU version and your drivers are set up correctly, it will show CUDA available: True
.
With your environment ready, you can now proceed to create the three Python files for our project inside the my_ai_project
directory.
Step 1: model.py
- Architecting the AI's Brain and Dictionary
This file defines our AI's core structure. It contains the Vocab
class to manage words and the EncoderRNN
and DecoderRNN
neural networks that form its "brain."
# Create this file as: my_ai_project/model.py
import torch
import torch.nn as nn
# --- Vocabulary Class ---
# Define special tokens that will be part of our vocabulary.
PAD_token = 0 # Used to pad shorter sentences to a standard length.
SOS_token = 1 # "Start Of Sentence" token. Marks the beginning of an input or output.
EOS_token = 2 # "End Of Sentence" token. Marks the end.
class Vocab:
"""
A class to manage the vocabulary and the mapping between words and numerical indices.
This is crucial because neural networks work with numbers, not text.
"""
def __init__(self):
# Initialize mappings and word counts. Start with special tokens.
self.word2index = {"<PAD>": PAD_token, "<SOS>": SOS_token, "<EOS>": EOS_token}
self.index2word = {PAD_token: "<PAD>", SOS_token: "<SOS>", EOS_token: "<EOS>"}
self.n_words = 3 # Start counting from 3 to account for the special tokens.
def add_sentence(self, sentence):
"""Splits a sentence into words and adds them to the vocabulary."""
for word in sentence.split(' '):
self.add_word(word)
def add_word(self, word):
"""Adds a new word to the vocabulary if it's not already there."""
if word not in self.word2index:
# Assign a new index to the word and update the mappings.
self.word2index[word] = self.n_words
self.index2word[self.n_words] = word
self.n_words += 1
# --- Neural Network Model Definition ---
class EncoderRNN(nn.Module):
"""The Encoder part of the Seq2Seq model. It reads and encodes the input sentence."""
def __init__(self, input_size, hidden_size, device):
super(EncoderRNN, self).__init__()
self.hidden_size = hidden_size
self.device = device
# Embedding layer: Turns word indices into dense vectors of a specified size.
self.embedding = nn.Embedding(input_size, hidden_size)
# GRU (Gated Recurrent Unit): A type of recurrent neural network that processes sequences.
self.gru = nn.GRU(hidden_size, hidden_size)
def forward(self, input, hidden):
"""Defines the forward pass of the encoder."""
# The input is a word index. The embedding layer turns it into a vector.
embedded = self.embedding(input).view(1, 1, -1)
# The GRU processes the embedded vector and the previous hidden state.
output, hidden = self.gru(embedded, hidden)
return output, hidden
def initHidden(self):
"""Initializes the hidden state with zeros."""
return torch.zeros(1, 1, self.hidden_size, device=self.device)
class DecoderRNN(nn.Module):
"""The Decoder part of the Seq2Seq model. It generates the output sentence."""
def __init__(self, hidden_size, output_size, device):
super(DecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.device = device
self.embedding = nn.Embedding(output_size, hidden_size)
self.gru = nn.GRU(hidden_size, hidden_size)
# Linear layer: Maps the GRU's output to the size of our vocabulary, giving a score for each word.
self.out = nn.Linear(hidden_size, output_size)
# LogSoftmax: Converts scores into log probabilities, which is suitable for the NLLLoss function.
self.log_softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
"""Defines the forward pass of the decoder."""
output = self.embedding(input).view(1, 1, -1)
output = torch.relu(output) # Apply a ReLU activation function.
output, hidden = self.gru(output, hidden)
output = self.log_softmax(self.out(output[0]))
return output, hidden
def initHidden(self):
"""Initializes the hidden state with zeros."""
return torch.zeros(1, 1, self.hidden_size, device=self.device)
Step 2: train.py
- Sending the AI to School
This script handles the entire training process. It loads data, initializes the models, and runs a training loop. In each loop, it feeds the model our "hello" example, calculates how "wrong" the model's prediction is (the Loss), and then adjusts the model's internal weights to make it a little bit better next time.
# Create this file as: my_ai_project/train.py
import torch
import torch.optim as optim
import torch.nn as nn
import os
# Import our custom classes and tokens from model.py
from model import Vocab, EncoderRNN, DecoderRNN, SOS_token, EOS_token
def train_step(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, device):
"""Performs a single training step."""
# Initialize the encoder's hidden state.
encoder_hidden = encoder.initHidden()
# Clear the gradients from the previous step.
encoder_optimizer.zero_grad()
decoder_optimizer.zero_grad()
loss = 0
# --- Encoding phase ---
# Iterate through each word of the input sentence.
for ei in range(input_tensor.size(0)):
_, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
# --- Decoding phase ---
# The decoder starts with the <SOS> token.
decoder_input = torch.tensor([[SOS_token]], device=device)
# The encoder's final hidden state is used as the decoder's initial hidden state.
decoder_hidden = encoder_hidden
# "Teacher Forcing": We feed the actual correct word from the target sentence
# as the input to the decoder at each step. This helps the model learn faster.
for di in range(target_tensor.size(0)):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
# Calculate the loss between the decoder's prediction and the true word.
loss += criterion(decoder_output, target_tensor[di])
# Set the next decoder input to the current correct word.
decoder_input = target_tensor[di]
# Stop if we've reached the end of the sentence.
if decoder_input.item() == EOS_token:
break
# Backpropagation: Calculate the gradients of the loss with respect to model parameters.
loss.backward()
# Update the model's weights using the optimizers.
encoder_optimizer.step()
decoder_optimizer.step()
# Return the average loss for this step.
return loss.item() / target_tensor.size(0)
def main():
# --- Hyperparameters and Settings ---
hidden_size = 256
learning_rate = 0.01
n_epochs = 1000
checkpoint_path = 'model_checkpoint.pth'
# Automatically select a device (GPU if available, otherwise CPU).
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# --- Data Preparation ---
input_sentence = "hello"
output_sentence = "hello to you too"
vocab = Vocab()
vocab.add_sentence(input_sentence)
vocab.add_sentence(output_sentence)
# Convert sentences to tensors of numerical indices.
input_tensor = torch.tensor([vocab.word2index[word] for word in input_sentence.split(' ')] + [EOS_token], dtype=torch.long).view(-1, 1).to(device)
target_tensor = torch.tensor([vocab.word2index[word] for word in output_sentence.split(' ')] + [EOS_token], dtype=torch.long).view(-1, 1).to(device)
# --- Initialize Models and Optimizers ---
encoder = EncoderRNN(vocab.n_words, hidden_size, device).to(device)
decoder = DecoderRNN(hidden_size, vocab.n_words, device).to(device)
encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
criterion = nn.NLLLoss() # Negative Log Likelihood Loss is suitable for this type of classification task.
start_epoch = 0
# --- Load checkpoint if it exists to resume training ---
if os.path.exists(checkpoint_path):
print(f"Found checkpoint '{checkpoint_path}', loading...")
checkpoint = torch.load(checkpoint_path)
encoder.load_state_dict(checkpoint['encoder_state_dict'])
decoder.load_state_dict(checkpoint['decoder_state_dict'])
encoder_optimizer.load_state_dict(checkpoint['encoder_optimizer_state_dict'])
decoder_optimizer.load_state_dict(checkpoint['decoder_optimizer_state_dict'])
start_epoch = checkpoint['epoch'] + 1
vocab = checkpoint['vocab'] # It's crucial to restore the vocabulary!
print(f"Load successful! Resuming training from epoch {start_epoch}.")
else:
print("No checkpoint found, starting training from scratch.")
# --- Training Loop ---
print("\n--- Starting Training ---")
for epoch in range(start_epoch, n_epochs):
loss = train_step(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, device)
# Periodically print the loss and save a checkpoint.
if (epoch + 1) % 100 == 0:
print(f"Epoch {epoch+1}/{n_epochs}, Loss: {loss:.4f}")
print("Saving checkpoint...")
torch.save({
'epoch': epoch,
'encoder_state_dict': encoder.state_dict(),
'decoder_state_dict': decoder.state_dict(),
'encoder_optimizer_state_dict': encoder_optimizer.state_dict(),
'decoder_optimizer_state_dict': decoder_optimizer.state_dict(),
'vocab': vocab, # Save the vocabulary along with the model.
}, checkpoint_path)
print("--- Training Complete ---")
if __name__ == '__main__':
main()
Step 3: inference.py
- Graduation Day and a Real Job
After training, our AI is ready! This script loads the saved model and allows us to interact with it. This is where we see the fruits of our labor.
# Create this file as: my_ai_project/inference.py
import torch
# Import the necessary classes and tokens from model.py
from model import EncoderRNN, DecoderRNN, Vocab, SOS_token, EOS_token
def evaluate(encoder, decoder, sentence, vocab, device, max_length=10):
"""Generates a response from the model for a given input sentence."""
# `torch.no_grad()` tells PyTorch we are not training, so it doesn't need to calculate gradients.
with torch.no_grad():
try:
# Convert the input sentence into a tensor of indices.
input_tensor = torch.tensor([vocab.word2index[word] for word in sentence.split(' ')] + [EOS_token], dtype=torch.long, device=device).view(-1, 1)
except KeyError as e:
# Handle cases where the input contains a word not in our vocabulary.
return f"Error: The word {e} is not in the vocabulary."
# --- Encoding ---
encoder_hidden = encoder.initHidden()
for ei in range(input_tensor.size(0)):
_, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
# --- Decoding ---
decoder_input = torch.tensor([[SOS_token]], device=device)
decoder_hidden = encoder_hidden
decoded_words = []
# Generate the response word by word.
for _ in range(max_length):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
# Get the word with the highest probability from the decoder's output.
topv, topi = decoder_output.data.topk(1)
if topi.item() == EOS_token:
# If the model outputs the <EOS> token, stop generating.
break
else:
decoded_words.append(vocab.index2word[topi.item()])
# Use the predicted word as the next input to the decoder.
decoder_input = topi.squeeze().detach()
return ' '.join(decoded_words)
def main():
# --- Settings ---
hidden_size = 256
checkpoint_path = 'model_checkpoint.pth'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# --- Load Checkpoint ---
print(f"Loading model from '{checkpoint_path}'...")
try:
# Load the saved checkpoint file.
# This is where we fixed the bug!
# We must set `weights_only=False` because our checkpoint contains a custom Python object (the Vocab class).
# This is safe because we trust the source of this file (we created it ourselves).
checkpoint = torch.load(checkpoint_path, map_location=device, weights_only=False)
except FileNotFoundError:
print("Error: Model file not found! Please run train.py first to train and save a model.")
return
except Exception as e:
print(f"An error occurred while loading the checkpoint: {e}")
print("This might be due to a PyTorch version mismatch or a corrupted file.")
return
# Restore the vocabulary and model weights from the checkpoint.
vocab = checkpoint['vocab']
encoder_state_dict = checkpoint['encoder_state_dict']
decoder_state_dict = checkpoint['decoder_state_dict']
# --- Initialize models and load the saved parameters ---
encoder = EncoderRNN(vocab.n_words, hidden_size, device).to(device)
decoder = DecoderRNN(hidden_size, vocab.n_words, device).to(device)
encoder.load_state_dict(encoder_state_dict)
decoder.load_state_dict(decoder_state_dict)
# Set the models to evaluation mode. This disables layers like Dropout that are only used during training.
encoder.eval()
decoder.eval()
print("Model loaded successfully! You can start chatting now (type 'quit' to exit).\n")
# --- Interactive Chat Loop ---
while True:
input_sentence = input('> ')
if input_sentence.lower() == 'quit':
break
# Get the model's response and print it.
response = evaluate(encoder, decoder, input_sentence, vocab, device)
print('<', response)
if __name__ == '__main__':
main()
The Final Result
After creating these three files, run the training script from your activated terminal:
# Make sure you are in the my_ai_project directory and your environment is active
# (ai_env) $
python train.py
You will see the Loss decreasing, indicating that the AI is learning. Once it's complete, run the inference script to chat with your new AI:
# (ai_env) $
python inference.py
Here's the expected interaction:
Loading model from 'model_checkpoint.pth'...
Model loaded successfully! You can start chatting now (type 'quit' to exit).
> hello
< hello to you too
> how are you
< Error: The word 'how' is not in the vocabulary.
> quit
It works perfectly! Our AI correctly responds to "hello" and smartly tells us when it encounters a word it doesn't know.
Conclusion: You've Built an AI
If you've followed along, you have just completed the entire lifecycle of a machine learning project, from setting up your environment to deploying a working application. The core concepts you've just implemented—data processing, model architecture, loss calculation, optimization, training loops, and inference—are the fundamental building blocks used in even the most advanced AI systems today.
AI is not magic. It's a field of engineering built on understandable principles. You've taken the most important step by building something yourself. Now, the world of AI is open for you to explore. What will you build next?