AdbhutMOE-Coding-Finetuned - Fine-tuned Coding Assistant
This model is a fine-tuned version of the rohitnagareddy/AdbhutMOE
Mixture-of-Experts (MoE) model, specialized for Python code generation and programming assistance tasks. It combines the efficiency of sparse MoE architecture with domain-specific fine-tuning for coding applications.
π» Model Description
- Base Model:
rohitnagareddy/AdbhutMOE
(Custom MoE Architecture) - Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Dataset:
TokenBender/code_instructions_122k_alpaca_style
- A comprehensive dataset of coding instructions and solutions - Architecture: Mixture-of-Experts with selective expert activation
- Training: Optimized for instruction-based code generation with memory-efficient techniques
ποΈ Architecture Details
This model is based on a custom Mixture-of-Experts architecture:
- Experts per Layer: 8 experts with 2 activated per token
- Hidden Dimension: 256
- Attention Heads: 4
- Layers: 4
- Vocabulary: Custom-trained tokenizer (~8K tokens)
- Max Sequence Length: 512 tokens
β οΈ Important Considerations
- Verify All Code: Generated code may contain errors or be suboptimal. Always test and review thoroughly.
- Security: Generated code has not been vetted for security vulnerabilities.
- Educational Model: This is a proof-of-concept model demonstrating MoE fine-tuning techniques.
- Limited Training: Model was trained with limited resources for demonstration purposes.
π Usage
Basic Text Generation
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
model_id = "rohitnagareddy/AdbhutMOE-Coding-Finetuned"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Create a text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer
)
# Generate code
prompt = '''### Instruction:
Write a Python function that takes a list of integers and returns the sum of all even numbers in the list.
### Response:'''
response = pipe(prompt, max_new_tokens=150, temperature=0.2, do_sample=True)
print(response[0]["generated_text"])
Direct Model Usage
# For more control over generation
prompt = '''### Instruction:
Create a Python class for a simple calculator with basic arithmetic operations.
### Response:'''
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.3,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
π Training Details
Fine-tuning Configuration
- Training Steps: 500 (limited for demonstration)
- Batch Size: 1 (with 8 gradient accumulation steps)
- Learning Rate: 1e-4
- Optimizer: Paged AdamW 8-bit
- LoRA Rank: 8
- LoRA Alpha: 16
- Target Modules: All linear layers including MoE experts and gates
Base Model Training
- Pre-training Data: AG News dataset sample
- Architecture: Custom Mixtral-based MoE
- Training Steps: 100 (base model pre-training)
π― Performance Notes
- Efficiency: MoE architecture provides parameter efficiency while maintaining performance
- Memory: Optimized for memory-efficient inference and training
- Speed: Sparse activation patterns enable faster inference compared to dense models of similar capability
π Model Lineage
- Base Architecture: Custom Mixtral MoE implementation
- Pre-training: Trained on AG News dataset sample
- Fine-tuning: LoRA adaptation on coding instruction dataset
- Optimization: 4-bit quantization support for efficient deployment
π Intended Use Cases
- Code Generation: Creating Python functions and classes
- Programming Education: Demonstrating coding concepts
- Research: Studying MoE architectures for domain-specific tasks
- Prototyping: Quick code snippet generation
π« Limitations
- Limited Scope: Primarily trained on basic coding tasks
- Language Focus: Optimized for Python, limited other language support
- Scale: Small model size limits complex reasoning capabilities
- Training Data: Limited training iterations due to resource constraints
π€ Contributing
This model serves as a foundation for further experimentation with MoE architectures in code generation. Contributions and improvements are welcome!
Fine-tuned by rohitnagareddy using LoRA on the AdbhutMOE architecture. This model demonstrates the application of parameter-efficient fine-tuning to Mixture-of-Experts models.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for rohitnagareddy/AdbhutMOE-Coding-Finetuned
Base model
rohitnagareddy/AdbhutMOE