prepping v0.2 release and IQ1_KT worlds smallest!
Browse files
README.md
CHANGED
@@ -32,9 +32,12 @@ Compare with Perplexity of full size `Q8_0` 1016.623 GiB (8.504 BPW):
|
|
32 |
|
33 |
Final estimate: PPL = 2.9507 +/- 0.01468
|
34 |
|
35 |
-
*
|
|
|
36 |
|
37 |
-
|
|
|
|
|
38 |
Final estimate: PPL = 3.0438 +/- 0.01536
|
39 |
|
40 |
Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
@@ -92,7 +95,7 @@ numactl -N 1 -m 1 \
|
|
92 |
|
93 |
</details>
|
94 |
|
95 |
-
### * `IQ3_KS` 427.205 GiB (3.573 BPW)
|
96 |
Final estimate: PPL = 3.1395 +/- 0.01604
|
97 |
|
98 |
Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
@@ -151,7 +154,7 @@ numactl -N 1 -m 1 \
|
|
151 |
</details>
|
152 |
|
153 |
|
154 |
-
### * `IQ2_KL` 345.687 GiB (2.892 BPW)
|
155 |
Final estimate: PPL = 3.2741 +/- 0.01689
|
156 |
|
157 |
Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
@@ -209,7 +212,7 @@ numactl -N 1 -m 1 \
|
|
209 |
|
210 |
</details>
|
211 |
|
212 |
-
### * `IQ2_KS` 286.624 GiB (2.398 BPW)
|
213 |
Final estimate: PPL = 3.7922 +/- 0.02045
|
214 |
|
215 |
Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
|
|
32 |
|
33 |
Final estimate: PPL = 2.9507 +/- 0.01468
|
34 |
|
35 |
+
## *UPDATING RECIPES*
|
36 |
+
Updating new better lower perplexity recipes and worlds smallest Kimi-K2-Instruct-smol-IQ1_KT at 219.375 GIB (1.835) BPW. Please ask any questions in [this discussion here](https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/4), thanks!
|
37 |
|
38 |
+
Look there for graph with new values. I'll update the model card after the dust has settled. Old versions still available as described in the dicsussion.
|
39 |
+
|
40 |
+
### * v0.1 `IQ4_KS` 550.428 GiB (4.604 BPW)
|
41 |
Final estimate: PPL = 3.0438 +/- 0.01536
|
42 |
|
43 |
Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
|
|
95 |
|
96 |
</details>
|
97 |
|
98 |
+
### * v0.1 `IQ3_KS` 427.205 GiB (3.573 BPW)
|
99 |
Final estimate: PPL = 3.1395 +/- 0.01604
|
100 |
|
101 |
Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
|
|
154 |
</details>
|
155 |
|
156 |
|
157 |
+
### * v0.1 `IQ2_KL` 345.687 GiB (2.892 BPW)
|
158 |
Final estimate: PPL = 3.2741 +/- 0.01689
|
159 |
|
160 |
Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|
|
|
212 |
|
213 |
</details>
|
214 |
|
215 |
+
### * v0.1 `IQ2_KS` 286.624 GiB (2.398 BPW)
|
216 |
Final estimate: PPL = 3.7922 +/- 0.02045
|
217 |
|
218 |
Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
|