Scored 64.29 4bit on MMLU Pro single shot using ollama
#1
by
xbruce22
- opened
logs
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+=========================================================+===========+=================+==================+=======+=========+=========+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | computer science | 10 | 0.4 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | math | 10 | 0.9 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | chemistry | 10 | 0.9 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | engineering | 10 | 0.6 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | law | 10 | 0.2 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | health | 10 | 0.8 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | physics | 10 | 0.5 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | business | 10 | 0.7 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.6 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | economics | 10 | 0.9 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | other | 10 | 0.7 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | psychology | 10 | 0.6 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | history | 10 | 0.3 | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.6429 | - |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
ollama's quantized version Q4 performed better in this case,
65.71 Q4_K_M ~19GB (ollama's Quant)
ollama's quantized version Q4 performed better in this case,
65.71 Q4_K_M ~19GB (ollama's Quant)
if you look at the size, ollama uses higher precision (at least not a standard Q4 model).
Yes can be that.
overall, good model. As coding model close to ~65 scores on single shot MMLU Pro is good.