pravdin's picture
Upload README.md with huggingface_hub
fe109ed verified
metadata
license: apache-2.0
tags:
  - text-generation
  - llama.cpp
  - gguf
  - quantization
  - merged-model
language:
  - en
library_name: gguf

merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model

This is a collection of GGUF quantized versions of pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.

🌳 Model Tree

This model was created by merging the following models:

pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
β”œβ”€β”€ Merge Method: dare_ties
β”œβ”€β”€ context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
└── unsloth/Llama-3.2-3B-Instruct
    β”œβ”€β”€ density: 0.6
    β”œβ”€β”€ weight: 0.5

Merge Method: DARE_TIES - Advanced merging technique that reduces interference between models

πŸ“Š Available Quantization Formats

This repository contains multiple quantization formats optimized for different use cases:

  • q4_k_m: 4-bit quantization, medium quality, good balance of size and performance
  • q5_k_m: 5-bit quantization, higher quality, slightly larger size
  • q8_0: 8-bit quantization, highest quality, larger size but minimal quality loss

πŸš€ Usage

With llama.cpp

# Download a specific quantization
wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf

# Run with llama.cpp
./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"

With Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")

# Generate text
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])

With Ollama

# Create a Modelfile
echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile

# Create and run the model
ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"

πŸ“‹ Model Details

🎯 Recommended Usage

  • q4_k_m: Best for most use cases, good quality/size trade-off
  • q5_k_m: When you need higher quality and have more storage/memory
  • q8_0: When you want minimal quality loss from the original model

⚑ Performance Notes

GGUF models are optimized for:

  • Faster loading times
  • Lower memory usage
  • CPU and GPU inference
  • Cross-platform compatibility

For best performance, ensure your hardware supports the quantization format you choose.


This model was automatically quantized using the Lemuru LLM toolkit.