AdbhutMOE-Coding-Finetuned - Fine-tuned Coding Assistant

This model is a fine-tuned version of the rohitnagareddy/AdbhutMOE Mixture-of-Experts (MoE) model, specialized for Python code generation and programming assistance tasks. It combines the efficiency of sparse MoE architecture with domain-specific fine-tuning for coding applications.

πŸ’» Model Description

  • Base Model: rohitnagareddy/AdbhutMOE (Custom MoE Architecture)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: TokenBender/code_instructions_122k_alpaca_style - A comprehensive dataset of coding instructions and solutions
  • Architecture: Mixture-of-Experts with selective expert activation
  • Training: Optimized for instruction-based code generation with memory-efficient techniques

πŸ—οΈ Architecture Details

This model is based on a custom Mixture-of-Experts architecture:

  • Experts per Layer: 8 experts with 2 activated per token
  • Hidden Dimension: 256
  • Attention Heads: 4
  • Layers: 4
  • Vocabulary: Custom-trained tokenizer (~8K tokens)
  • Max Sequence Length: 512 tokens

⚠️ Important Considerations

  • Verify All Code: Generated code may contain errors or be suboptimal. Always test and review thoroughly.
  • Security: Generated code has not been vetted for security vulnerabilities.
  • Educational Model: This is a proof-of-concept model demonstrating MoE fine-tuning techniques.
  • Limited Training: Model was trained with limited resources for demonstration purposes.

πŸš€ Usage

Basic Text Generation

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "rohitnagareddy/AdbhutMOE-Coding-Finetuned"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Create a text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Generate code
prompt = '''### Instruction:
Write a Python function that takes a list of integers and returns the sum of all even numbers in the list.

### Response:'''

response = pipe(prompt, max_new_tokens=150, temperature=0.2, do_sample=True)
print(response[0]["generated_text"])

Direct Model Usage

# For more control over generation
prompt = '''### Instruction:
Create a Python class for a simple calculator with basic arithmetic operations.

### Response:'''

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.3,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

πŸ“Š Training Details

Fine-tuning Configuration

  • Training Steps: 500 (limited for demonstration)
  • Batch Size: 1 (with 8 gradient accumulation steps)
  • Learning Rate: 1e-4
  • Optimizer: Paged AdamW 8-bit
  • LoRA Rank: 8
  • LoRA Alpha: 16
  • Target Modules: All linear layers including MoE experts and gates

Base Model Training

  • Pre-training Data: AG News dataset sample
  • Architecture: Custom Mixtral-based MoE
  • Training Steps: 100 (base model pre-training)

🎯 Performance Notes

  • Efficiency: MoE architecture provides parameter efficiency while maintaining performance
  • Memory: Optimized for memory-efficient inference and training
  • Speed: Sparse activation patterns enable faster inference compared to dense models of similar capability

πŸ”„ Model Lineage

  1. Base Architecture: Custom Mixtral MoE implementation
  2. Pre-training: Trained on AG News dataset sample
  3. Fine-tuning: LoRA adaptation on coding instruction dataset
  4. Optimization: 4-bit quantization support for efficient deployment

πŸ“ˆ Intended Use Cases

  • Code Generation: Creating Python functions and classes
  • Programming Education: Demonstrating coding concepts
  • Research: Studying MoE architectures for domain-specific tasks
  • Prototyping: Quick code snippet generation

🚫 Limitations

  • Limited Scope: Primarily trained on basic coding tasks
  • Language Focus: Optimized for Python, limited other language support
  • Scale: Small model size limits complex reasoning capabilities
  • Training Data: Limited training iterations due to resource constraints

🀝 Contributing

This model serves as a foundation for further experimentation with MoE architectures in code generation. Contributions and improvements are welcome!


Fine-tuned by rohitnagareddy using LoRA on the AdbhutMOE architecture. This model demonstrates the application of parameter-efficient fine-tuning to Mixture-of-Experts models.

Downloads last month
3
Safetensors
Model size
15.7M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rohitnagareddy/AdbhutMOE-Coding-Finetuned

Adapter
(1)
this model

Dataset used to train rohitnagareddy/AdbhutMOE-Coding-Finetuned