TinyLlama-1.1B-Chat LoRA Fine-Tuned Model
Table of Contents
Overview
This repository contains a LoRA (Low-Rank Adaptation) fine-tuned version of the TinyLlama/TinyLlama-1.1B-Chat-v0.6
model, optimized for instruction-following and question-answering tasks. The model has been adapted using Parameter-Efficient Fine-Tuning (PEFT) techniques to specialize in conversational AI applications while maintaining the base model's general capabilities.
Model Architecture
- Base Model: TinyLlama-1.1B-Chat (Transformer-based)
- Layers: 22
- Attention Heads: 32
- Hidden Size: 2048
- Context Length: 2048 tokens (limited to 256 during fine-tuning)
- Vocab Size: 32,000
Key Features
- π Parameter-Efficient Fine-Tuning: Only 0.39% of parameters (4.2M) trained
- πΎ Memory Optimization: 8-bit quantization via BitsAndBytes
- β‘ Fast Inference: Optimized for conversational response times
- π€ Instruction-Tuned: Specialized for Q&A and instructional tasks
- π§ Modular Design: Easy to adapt for different use cases
- π¦ Hugging Face Integration: Fully compatible with Transformers ecosystem
Installation
Prerequisites
- Python 3.8+
- PyTorch 2.0+ (with CUDA 11.7+ if GPU acceleration desired)
- NVIDIA GPU (recommended for training and inference)
Package Installation
pip install torch transformers peft accelerate bitsandbytes pandas datasets
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support