ubergarm
/

Kimi-K2-Instruct-GGUF

Text Generation

GGUF

mla

imatrix

conversational

ik_llama.cpp

Model card Files Files and versions Community

ubergarm commited on Jul 21

Commit

8f23292

1 Parent(s): d76dcc6

prepping v0.2 release and IQ1_KT worlds smallest!

Browse files

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -32,9 +32,12 @@ Compare with Perplexity of full size `Q8_0` 1016.623 GiB (8.504 BPW):
 Final estimate: PPL = 2.9507 +/- 0.01468
-*NOTE*: More sizes coming soon as they finish cooking! Join in discussion to request your target size and share your findings!
-### * `IQ4_KS` 550.428 GiB (4.604 BPW)
 Final estimate: PPL = 3.0438 +/- 0.01536
 Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -92,7 +95,7 @@ numactl -N 1 -m 1 \
 </details>
-### * `IQ3_KS` 427.205 GiB (3.573 BPW)
 Final estimate: PPL = 3.1395 +/- 0.01604
 Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -151,7 +154,7 @@ numactl -N 1 -m 1 \
 </details>
-### * `IQ2_KL` 345.687 GiB (2.892 BPW)
 Final estimate: PPL = 3.2741 +/- 0.01689
 Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
@@ -209,7 +212,7 @@ numactl -N 1 -m 1 \
 </details>
-### * `IQ2_KS` 286.624 GiB (2.398 BPW)
 Final estimate: PPL = 3.7922 +/- 0.02045
 Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".

 Final estimate: PPL = 2.9507 +/- 0.01468
+## *UPDATING RECIPES*
+Updating new better lower perplexity recipes and worlds smallest Kimi-K2-Instruct-smol-IQ1_KT at 219.375 GIB (1.835) BPW. Please ask any questions in [this discussion here](https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/4), thanks!
+Look there for graph with new values. I'll update the model card after the dust has settled. Old versions still available as described in the dicsussion.
+### * v0.1 `IQ4_KS` 550.428 GiB (4.604 BPW)
 Final estimate: PPL = 3.0438 +/- 0.01536
 Special mix of `IQ4_KS` `ffn_(gate|up)_exps` and `IQ5_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 </details>
+### * v0.1 `IQ3_KS` 427.205 GiB (3.573 BPW)
 Final estimate: PPL = 3.1395 +/- 0.01604
 Special mix of `IQ3_KS` `ffn_(gate|up)_exps` and `IQ4_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 </details>
+### * v0.1 `IQ2_KL` 345.687 GiB (2.892 BPW)
 Final estimate: PPL = 3.2741 +/- 0.01689
 Special mix with brand new *SOTA* `IQ2_KL` `ffn_(gate|up)_exps` and `IQ3_KS` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".
 </details>
+### * v0.1 `IQ2_KS` 286.624 GiB (2.398 BPW)
 Final estimate: PPL = 3.7922 +/- 0.02045
 Special mix with `IQ2_KS` `ffn_(gate|up)_exps` and band new SOTA `IQ2_KL` `ffn_down_exps` routed experts. Mostly `iq5_ks/iq4_ks` for attn/shexp/first dense layer. `iq4_k` `token_embd` and `iq6_k` `output` "head".