pravdin
/

merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct-gguf

pravdin commited on 24 days ago

Commit

fe109ed

verified ·

1 Parent(s): 3694d73

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +103 -0

README.md ADDED Viewed

	@@ -0,0 +1,103 @@

+---
+license: apache-2.0
+tags:
+- text-generation
+- llama.cpp
+- gguf
+- quantization
+- merged-model
+language:
+- en
+library_name: gguf
+---
+# merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model
+This is a collection of GGUF quantized versions of [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct).
+## 🌳 Model Tree
+This model was created by merging the following models:
+```
+pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
+├── Merge Method: dare_ties
+├── context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
+└── unsloth/Llama-3.2-3B-Instruct
+    ├── density: 0.6
+    ├── weight: 0.5
+```
+**Merge Method**: DARE_TIES - Advanced merging technique that reduces interference between models
+## 📊 Available Quantization Formats
+This repository contains multiple quantization formats optimized for different use cases:
+- **q4_k_m**: 4-bit quantization, medium quality, good balance of size and performance
+- **q5_k_m**: 5-bit quantization, higher quality, slightly larger size
+- **q8_0**: 8-bit quantization, highest quality, larger size but minimal quality loss
+## 🚀 Usage
+### With llama.cpp
+```bash
+# Download a specific quantization
+wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf
+# Run with llama.cpp
+./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"
+```
+### With Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+# Load the model
+llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")
+# Generate text
+output = llm("Your prompt here", max_tokens=512)
+print(output['choices'][0]['text'])
+```
+### With Ollama
+```bash
+# Create a Modelfile
+echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile
+# Create and run the model
+ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
+ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"
+```
+## 📋 Model Details
+- **Original Model**: [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct)
+- **Quantization Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+- **License**: Same as original model
+- **Use Cases**: Optimized for local inference, edge deployment, and resource-constrained environments
+## 🎯 Recommended Usage
+- **q4_k_m**: Best for most use cases, good quality/size trade-off
+- **q5_k_m**: When you need higher quality and have more storage/memory
+- **q8_0**: When you want minimal quality loss from the original model
+## ⚡ Performance Notes
+GGUF models are optimized for:
+- Faster loading times
+- Lower memory usage
+- CPU and GPU inference
+- Cross-platform compatibility
+For best performance, ensure your hardware supports the quantization format you choose.
+---
+*This model was automatically quantized using the Lemuru LLM toolkit.*