Scored 68.57 on MMLU Pro single shot using llama.cpp
#2
by
xbruce22
- opened
logs
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+================================================+===========+=================+==================+=======+=========+=========+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | computer science | 10 | 0.6 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | math | 10 | 0.9 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | chemistry | 10 | 0.8 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | engineering | 10 | 0.7 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | law | 10 | 0.2 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | health | 10 | 0.8 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | physics | 10 | 0.6 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | business | 10 | 0.7 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.6 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | economics | 10 | 0.8 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | other | 10 | 0.7 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | psychology | 10 | 0.8 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | history | 10 | 0.5 | default |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Intel-Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.6857 | - |
+------------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
The model is performing differently when run on ollama - NVIDIA GPU and llama.cpp CPU.
Looks like it's related with GEMM implementation. Does ollama GPU use FP32 or FP16 as accumulator data type?
Ollama uses FP32