Pandas-Tutor-Gemma-2B

A Specialized Code Assistant for the Python `pandas` Library

🚀 Model Overview

Pandas-Tutor-Gemma-2B is a fine-tuned version of Google's gemma-2b-it model, specialized for bidirectional tasks related to the Python pandas library. It was trained on a high-quality, curated dataset to become a reliable assistant for both novice and experienced developers.

The model excels at two primary functions:

Code Generation: Translates natural language instructions into precise pandas code.
Code Explanation: Describes the functionality of pandas code snippets in clear, easy-to-understand language.

This project demonstrates that with modern, parameter-efficient fine-tuning (PEFT) techniques, it's possible to create highly effective, specialized models on consumer-grade hardware.

🛠️ How to Use

You can easily use this model with the transformers library. Ensure you have transformers, accelerate, and bitsandbytes installed.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Model repository on Hugging Face
model_name = "csmishra952/Pandas-Tutor-Gemma-2B" 

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

def get_response(instruction, input_text=""):
    """A helper function to format the prompt and generate a response."""
    prompt = f"<bos><start_of_turn>user\n{instruction}\n{input_text}<end_of_turn>\n<start_of_turn>model\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=128)
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return result.split("model\n")[-1]

# --- Example 1: Code Explanation (Code-to-NL) ---
explanation = get_response(
    instruction="Explain what this Pandas code does.",
    input_text="df.groupby('department')['salary'].agg(['mean', 'max'])"
)
print("--- Code Explanation ---")
print(explanation)

# --- Example 2: Code Generation (NL-to-Code) ---
generated_code = get_response(
    instruction="Write the Pandas code to select all rows where the 'product_category' is 'Electronics' and the 'price' is less than 500."
)
print("\n--- Generated Code ---")
print(generated_code)
⚙️ Training DetailsModel ArchitectureThe model uses a Low-Rank Adaptation (LoRA) architecture. Instead of retraining the entire 2-billion-parameter base model, we freeze the base model and train only a small number of "adapter" matrices. This makes the training process incredibly efficient.┌───────────────────────────┐
│                           │
│   Frozen Gemma-2B Model   │
│   (2 Billion Parameters)  │
│                           │
└───────────┬───────────────┘
            │
┌───────────▼───────────────┐
│   Trainable LoRA Adapters │
│   (~0.1% of Parameters)   │
└───────────────────────────┘
Fine-Tuning TechniqueThe model was fine-tuned using QLoRA, which further optimizes LoRA by loading the base model in a quantized 4-bit precision. This drastically reduces memory consumption, allowing the fine-tuning to be performed on a single T4 GPU in Google Colab.Training HyperparametersHyperparameterValueBase Modelgoogle/gemma-2b-itFine-tuning MethodQLoRALoRA r (Rank)8LoRA alpha32Precision4-bit (nf4)Compute dtypebfloat16OptimizerPaged AdamW (32-bit)Learning Rate2e-4Epochs1Batch Size1 per deviceGradient Accumulation8Training DataThe model was trained on a custom dataset of 181 high-quality examples derived from highly-voted pandas questions and their accepted answers on Stack Overflow. The data was manually cleaned, verified, and structured into a bidirectional, instruction-following format (JSONL).⚖️ LicenseThis model is licensed under the MIT License. You are free to use, modify, and distribute this model for any purpose, including commercial use.✍️ CitationIf you use this model or find this project helpful in your own work, please consider citing it:@misc{pandas_tutor_gemma_mishra,
  author = {Chandrasekhar Mishra},
  title = {Pandas-Tutor-Gemma-2B: A Specialized Code Assistant},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{[https://huggingface.co/csmishra952/Pandas-Tutor-Gemma-2B](https://huggingface.co/csmishra952/Pandas-Tutor-Gemma-2B)}}
}

csmishra952
/

Pandas-Tutor-Gemma-2B

Pandas-Tutor-Gemma-2B

A Specialized Code Assistant for the Python `pandas` Library

🚀 Model Overview

🛠️ How to Use

Model tree for csmishra952/Pandas-Tutor-Gemma-2B

Space using csmishra952/Pandas-Tutor-Gemma-2B 1

Pandas-Tutor-Gemma-2B

A Specialized Code Assistant for the Python pandas Library

🚀 Model Overview

🛠️ How to Use

Model tree for csmishra952/Pandas-Tutor-Gemma-2B

Space using csmishra952/Pandas-Tutor-Gemma-2B 1

A Specialized Code Assistant for the Python `pandas` Library