Scored 64.29 4bit on MMLU Pro single shot using ollama

#1
by xbruce22 - opened

logs

+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model                                                   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========================================================+===========+=================+==================+=======+=========+=========+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | computer science |    10 |  0.4    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | math             |    10 |  0.9    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | chemistry        |    10 |  0.9    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | engineering      |    10 |  0.6    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | law              |    10 |  0.2    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | biology          |    10 |  0.9    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | health           |    10 |  0.8    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | physics          |    10 |  0.5    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | business         |    10 |  0.7    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | philosophy       |    10 |  0.6    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | economics        |    10 |  0.9    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | other            |    10 |  0.7    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | psychology       |    10 |  0.6    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | history          |    10 |  0.3    | default |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound-Q4_K_M | mmlu_pro  | AverageAccuracy | OVERALL          |   140 |  0.6429 | -       |
+---------------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+

ollama's quantized version Q4 performed better in this case,

65.71 Q4_K_M ~19GB (ollama's Quant)

Intel org

ollama's quantized version Q4 performed better in this case,

65.71 Q4_K_M ~19GB (ollama's Quant)

if you look at the size, ollama uses higher precision (at least not a standard Q4 model).

Yes can be that.

overall, good model. As coding model close to ~65 scores on single shot MMLU Pro is good.

Sign up or log in to comment