Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- text-generation
|
5 |
+
- llama.cpp
|
6 |
+
- gguf
|
7 |
+
- quantization
|
8 |
+
- merged-model
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
library_name: gguf
|
12 |
+
---
|
13 |
+
|
14 |
+
# merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model
|
15 |
+
|
16 |
+
This is a collection of GGUF quantized versions of [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct).
|
17 |
+
|
18 |
+
## π³ Model Tree
|
19 |
+
|
20 |
+
This model was created by merging the following models:
|
21 |
+
|
22 |
+
```
|
23 |
+
pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
|
24 |
+
βββ Merge Method: dare_ties
|
25 |
+
βββ context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
|
26 |
+
βββ unsloth/Llama-3.2-3B-Instruct
|
27 |
+
βββ density: 0.6
|
28 |
+
βββ weight: 0.5
|
29 |
+
```
|
30 |
+
|
31 |
+
**Merge Method**: DARE_TIES - Advanced merging technique that reduces interference between models
|
32 |
+
|
33 |
+
|
34 |
+
## π Available Quantization Formats
|
35 |
+
|
36 |
+
This repository contains multiple quantization formats optimized for different use cases:
|
37 |
+
|
38 |
+
- **q4_k_m**: 4-bit quantization, medium quality, good balance of size and performance
|
39 |
+
- **q5_k_m**: 5-bit quantization, higher quality, slightly larger size
|
40 |
+
- **q8_0**: 8-bit quantization, highest quality, larger size but minimal quality loss
|
41 |
+
|
42 |
+
## π Usage
|
43 |
+
|
44 |
+
### With llama.cpp
|
45 |
+
|
46 |
+
```bash
|
47 |
+
# Download a specific quantization
|
48 |
+
wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf
|
49 |
+
|
50 |
+
# Run with llama.cpp
|
51 |
+
./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"
|
52 |
+
```
|
53 |
+
|
54 |
+
### With Python (llama-cpp-python)
|
55 |
+
|
56 |
+
```python
|
57 |
+
from llama_cpp import Llama
|
58 |
+
|
59 |
+
# Load the model
|
60 |
+
llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")
|
61 |
+
|
62 |
+
# Generate text
|
63 |
+
output = llm("Your prompt here", max_tokens=512)
|
64 |
+
print(output['choices'][0]['text'])
|
65 |
+
```
|
66 |
+
|
67 |
+
### With Ollama
|
68 |
+
|
69 |
+
```bash
|
70 |
+
# Create a Modelfile
|
71 |
+
echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile
|
72 |
+
|
73 |
+
# Create and run the model
|
74 |
+
ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
|
75 |
+
ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"
|
76 |
+
```
|
77 |
+
|
78 |
+
## π Model Details
|
79 |
+
|
80 |
+
- **Original Model**: [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct)
|
81 |
+
- **Quantization Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
82 |
+
- **License**: Same as original model
|
83 |
+
- **Use Cases**: Optimized for local inference, edge deployment, and resource-constrained environments
|
84 |
+
|
85 |
+
## π― Recommended Usage
|
86 |
+
|
87 |
+
- **q4_k_m**: Best for most use cases, good quality/size trade-off
|
88 |
+
- **q5_k_m**: When you need higher quality and have more storage/memory
|
89 |
+
- **q8_0**: When you want minimal quality loss from the original model
|
90 |
+
|
91 |
+
## β‘ Performance Notes
|
92 |
+
|
93 |
+
GGUF models are optimized for:
|
94 |
+
- Faster loading times
|
95 |
+
- Lower memory usage
|
96 |
+
- CPU and GPU inference
|
97 |
+
- Cross-platform compatibility
|
98 |
+
|
99 |
+
For best performance, ensure your hardware supports the quantization format you choose.
|
100 |
+
|
101 |
+
---
|
102 |
+
|
103 |
+
*This model was automatically quantized using the Lemuru LLM toolkit.*
|