ubergarm commited on
Commit
8f23292
·
1 Parent(s): d76dcc6

prepping v0.2 release and IQ1_KT worlds smallest!

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -32,9 +32,12 @@ Compare with Perplexity of full size `Q8_0` 1016.623 GiB (8.504 BPW):
32
 
33
  Final estimate: PPL = 2.9507 +/- 0.01468
34
 
35
- *NOTE*: More sizes coming soon as they finish cooking! Join in discussion to request your target size and share your findings!
 
36
 
37
- ### * `IQ4_KS` 550.428 GiB (4.604 BPW)
 
 
38
  Final estimate: PPL = 3.0438 +/- 0.01536
39
 
40
  Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -92,7 +95,7 @@ numactl -N 1 -m 1 \
92
 
93
  </details>
94
 
95
- ### * `IQ3_KS` 427.205 GiB (3.573 BPW)
96
  Final estimate: PPL = 3.1395 +/- 0.01604
97
 
98
  Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -151,7 +154,7 @@ numactl -N 1 -m 1 \
151
  </details>
152
 
153
 
154
- ### * `IQ2_KL` 345.687 GiB (2.892 BPW)
155
  Final estimate: PPL = 3.2741 +/- 0.01689
156
 
157
  Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -209,7 +212,7 @@ numactl -N 1 -m 1 \
209
 
210
  </details>
211
 
212
- ### * `IQ2_KS` 286.624 GiB (2.398 BPW)
213
  Final estimate: PPL = 3.7922 +/- 0.02045
214
 
215
  Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 
32
 
33
  Final estimate: PPL = 2.9507 +/- 0.01468
34
 
35
+ ## *UPDATING RECIPES*
36
+ Updating new better lower perplexity recipes and worlds smallest Kimi-K2-Instruct-smol-IQ1_KT at 219.375 GIB (1.835) BPW. Please ask any questions in [this discussion here](https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/4), thanks!
37
 
38
+ Look there for graph with new values. I'll update the model card after the dust has settled. Old versions still available as described in the dicsussion.
39
+
40
+ ### * v0.1 `IQ4_KS` 550.428 GiB (4.604 BPW)
41
  Final estimate: PPL = 3.0438 +/- 0.01536
42
 
43
  Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 
95
 
96
  </details>
97
 
98
+ ### * v0.1 `IQ3_KS` 427.205 GiB (3.573 BPW)
99
  Final estimate: PPL = 3.1395 +/- 0.01604
100
 
101
  Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 
154
  </details>
155
 
156
 
157
+ ### * v0.1 `IQ2_KL` 345.687 GiB (2.892 BPW)
158
  Final estimate: PPL = 3.2741 +/- 0.01689
159
 
160
  Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 
212
 
213
  </details>
214
 
215
+ ### * v0.1 `IQ2_KS` 286.624 GiB (2.398 BPW)
216
  Final estimate: PPL = 3.7922 +/- 0.02045
217
 
218
  Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".