TinyLlama-1.1B-Chat LoRA Fine-Tuned Model

LoRA Diagram

Table of Contents

Overview

This repository contains a LoRA (Low-Rank Adaptation) fine-tuned version of the TinyLlama/TinyLlama-1.1B-Chat-v0.6 model, optimized for instruction-following and question-answering tasks. The model has been adapted using Parameter-Efficient Fine-Tuning (PEFT) techniques to specialize in conversational AI applications while maintaining the base model's general capabilities.

Model Architecture

  • Base Model: TinyLlama-1.1B-Chat (Transformer-based)
  • Layers: 22
  • Attention Heads: 32
  • Hidden Size: 2048
  • Context Length: 2048 tokens (limited to 256 during fine-tuning)
  • Vocab Size: 32,000

Key Features

  • πŸš€ Parameter-Efficient Fine-Tuning: Only 0.39% of parameters (4.2M) trained
  • πŸ’Ύ Memory Optimization: 8-bit quantization via BitsAndBytes
  • ⚑ Fast Inference: Optimized for conversational response times
  • πŸ€– Instruction-Tuned: Specialized for Q&A and instructional tasks
  • πŸ”§ Modular Design: Easy to adapt for different use cases
  • πŸ“¦ Hugging Face Integration: Fully compatible with Transformers ecosystem

Installation

Prerequisites

  • Python 3.8+
  • PyTorch 2.0+ (with CUDA 11.7+ if GPU acceleration desired)
  • NVIDIA GPU (recommended for training and inference)

Package Installation

pip install torch transformers peft accelerate bitsandbytes pandas datasets
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support