GPT-Neo 1.3B Enhanced for Code and Conversation

A fine-tuned version of GPT-Neo 1.3B optimized for both conversational AI and Python code generation. This model combines instruction-following capabilities with comprehensive Python programming knowledge through a multi-layer fine-tuning approach.

Model Description

Base Model: EleutherAI/gpt-neo-1.3B
Fine-tuning Approach: Multi-layer sequential training
Specializations: Conversation + Python Code Generation

Training Layers:

  1. Conversational Foundation: Fine-tuned on high-quality dialogue data for instruction-following
  2. Code Specialization: Enhanced with 362,059 Python code examples from CodeSearchNet dataset
  3. Integration: Maintains conversational abilities while adding strong coding capabilities

Training Details

  • Architecture: GPT-Neo 1.3B (transformer-based autoregressive language model)
  • Training Infrastructure: European HPC systems with AMD GPU acceleration
  • Distributed Training: Multi-GPU setup with gradient accumulation
  • Final Training Loss: 0.4554 (excellent convergence)
  • CodeSearchNet Dataset: 362,059 high-quality Python code-documentation pairs
  • Training Duration: ~6 hours on 8x AMD MI250X GPUs
  • Optimization: AdamW optimizer with cosine annealing schedule

Capabilities

Code Generation

  • Python Functions: Complete implementations with proper documentation
  • Algorithm Development: Data structures, algorithms, and problem-solving
  • Code Explanation: Clear explanations of functionality and logic
  • Documentation: Automatic docstring and comment generation

Conversational AI

  • Instruction Following: Responds appropriately to coding requests
  • Technical Explanations: Breaks down complex programming concepts
  • Problem Solving: Helps debug and optimize code solutions
  • Educational Content: Teaches programming concepts step-by-step

Usage Examples

Python Code Generation

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

model = GPTNeoForCausalLM.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation")
tokenizer = GPT2Tokenizer.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation")
tokenizer.pad_token = tokenizer.eos_token

# Code generation example
prompt = "Human: Write a Python function that calculates the factorial of a number\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Code Explanation
pythonprompt = "Human: Explain how binary search works in Python\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Debugging Assistance
pythonprompt = "Human: Why does this Python code give a list index error?\ncode: for i in range(len(data)+1): print(data[i])\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=250, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Methodology
Multi-Layer Fine-tuning Strategy

Base Selection: Started with EleutherAI's GPT-Neo 1.3B pre-trained model
Layer 1 - Conversational: Fine-tuned on dialogue data for instruction-following
Layer 2 - Code Enhancement: Specialized training on CodeSearchNet Python dataset
Quality Assurance: Rigorous filtering for high-quality code-documentation pairs

Technical Implementation

Distributed Training: 8x AMD MI250X GPUs with proper CPU-GPU affinity
Batch Configuration: Per-device batch size of 4 with gradient accumulation
Learning Rate: 5e-6 with cosine annealing schedule
Sequence Length: 512 tokens maximum
Epochs: 2 epochs over full dataset for optimal convergence

Performance Metrics

Training Loss Progression: 0.9556 → 0.4554 (excellent convergence)
Dataset Coverage: 362,059 Python code examples
Training Efficiency: ~11,315 batches per epoch
Model Size: ~5.3GB (2x safetensors files)
Context Length: 512 tokens

Limitations

Language Focus: Primarily trained on Python code (limited other programming languages)
Code Complexity: Best performance on functions under 100 lines
Validation Required: Generated code should be tested before production use
Knowledge Cutoff: Training data reflects pre-2024 coding practices
Context Window: Limited to 512 tokens for generation

Ethical Considerations

Code Review: All generated code should be reviewed for security and correctness
Bias Awareness: May reflect biases present in training data
Responsible Use: Not intended for malicious code generation
Attribution: Based on open-source datasets and models

Technical Specifications

Model Type: Causal Language Model (GPT-Neo architecture)
Parameters: 1.3 billion
Vocabulary Size: 50,257 tokens
Hidden Size: 2,048
Attention Heads: 16
Layers: 24
Context Length: 2,048 tokens (training used 512)

Citation
bibtex@misc{gpt-neo-code-conversation-2025,
  title={GPT-Neo 1.3B Enhanced for Code and Conversation},
  author={Raimonds Krauklis},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/raimondskrauklis/gpt-neo-1.3b-code-conversation},
  note={Fine-tuned on European HPC infrastructure using CodeSearchNet dataset}
}
Acknowledgments

Base Model: EleutherAI for GPT-Neo 1.3B
Dataset: CodeSearchNet by GitHub/Microsoft Research
Infrastructure: European high-performance computing systems
Framework: Hugging Face Transformers and PyTorch ecosystem

Model Card Contact
For questions about this model, please open an issue in the model repository or contact through Hugging Face.
Downloads last month
20
Safetensors
Model size
1.32B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for raimondskrauklis/gpt-neo-1.3b-code-conversation

Finetuned
(36)
this model

Datasets used to train raimondskrauklis/gpt-neo-1.3b-code-conversation

Evaluation results