pravdin commited on
Commit
fe109ed
Β·
verified Β·
1 Parent(s): 3694d73

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-generation
5
+ - llama.cpp
6
+ - gguf
7
+ - quantization
8
+ - merged-model
9
+ language:
10
+ - en
11
+ library_name: gguf
12
+ ---
13
+
14
+ # merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model
15
+
16
+ This is a collection of GGUF quantized versions of [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct).
17
+
18
+ ## 🌳 Model Tree
19
+
20
+ This model was created by merging the following models:
21
+
22
+ ```
23
+ pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
24
+ β”œβ”€β”€ Merge Method: dare_ties
25
+ β”œβ”€β”€ context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
26
+ └── unsloth/Llama-3.2-3B-Instruct
27
+ β”œβ”€β”€ density: 0.6
28
+ β”œβ”€β”€ weight: 0.5
29
+ ```
30
+
31
+ **Merge Method**: DARE_TIES - Advanced merging technique that reduces interference between models
32
+
33
+
34
+ ## πŸ“Š Available Quantization Formats
35
+
36
+ This repository contains multiple quantization formats optimized for different use cases:
37
+
38
+ - **q4_k_m**: 4-bit quantization, medium quality, good balance of size and performance
39
+ - **q5_k_m**: 5-bit quantization, higher quality, slightly larger size
40
+ - **q8_0**: 8-bit quantization, highest quality, larger size but minimal quality loss
41
+
42
+ ## πŸš€ Usage
43
+
44
+ ### With llama.cpp
45
+
46
+ ```bash
47
+ # Download a specific quantization
48
+ wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf
49
+
50
+ # Run with llama.cpp
51
+ ./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"
52
+ ```
53
+
54
+ ### With Python (llama-cpp-python)
55
+
56
+ ```python
57
+ from llama_cpp import Llama
58
+
59
+ # Load the model
60
+ llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")
61
+
62
+ # Generate text
63
+ output = llm("Your prompt here", max_tokens=512)
64
+ print(output['choices'][0]['text'])
65
+ ```
66
+
67
+ ### With Ollama
68
+
69
+ ```bash
70
+ # Create a Modelfile
71
+ echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile
72
+
73
+ # Create and run the model
74
+ ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
75
+ ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"
76
+ ```
77
+
78
+ ## πŸ“‹ Model Details
79
+
80
+ - **Original Model**: [pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct](https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct)
81
+ - **Quantization Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
82
+ - **License**: Same as original model
83
+ - **Use Cases**: Optimized for local inference, edge deployment, and resource-constrained environments
84
+
85
+ ## 🎯 Recommended Usage
86
+
87
+ - **q4_k_m**: Best for most use cases, good quality/size trade-off
88
+ - **q5_k_m**: When you need higher quality and have more storage/memory
89
+ - **q8_0**: When you want minimal quality loss from the original model
90
+
91
+ ## ⚑ Performance Notes
92
+
93
+ GGUF models are optimized for:
94
+ - Faster loading times
95
+ - Lower memory usage
96
+ - CPU and GPU inference
97
+ - Cross-platform compatibility
98
+
99
+ For best performance, ensure your hardware supports the quantization format you choose.
100
+
101
+ ---
102
+
103
+ *This model was automatically quantized using the Lemuru LLM toolkit.*