Update README.md
Browse files
README.md
CHANGED
@@ -55,14 +55,14 @@ print((tokenizer.decode(outputs[0])))
|
|
55 |
Zamba2-1.2B-Instruct-v2 achieves leading instruction-following performance for a model of its size and surpasses models of significantly larger size. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
|
56 |
|
57 |
|
58 |
-
| Model | Size | IFEval | BBH | GPQA |
|
59 |
|:-------|:------:|:--------:|:-----:|:------:|:-----------:|:----------:|:------:|:-----------:|
|
60 |
-
| Zamba2-1.2B-Instruct-v2 | 1.
|
61 |
-
| Zamba2-1.2B-Instruct | 1.
|
62 |
-
| Gemma-2-2b-it | 2.
|
63 |
-
| SmolLM2-1.7B-Instruct | 1.
|
64 |
-
| Qwen-2.5-1.5B-Instruct | 1.
|
65 |
-
| Llama-3.2-1B-Instruct | 1.
|
66 |
|
67 |
Due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
68 |
|
|
|
55 |
Zamba2-1.2B-Instruct-v2 achieves leading instruction-following performance for a model of its size and surpasses models of significantly larger size. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
|
56 |
|
57 |
|
58 |
+
| Model | Size (B) | IFEval | BBH | GPQA | MATH (Hard) | MMLU Pro | MUSR | Aggregate |
|
59 |
|:-------|:------:|:--------:|:-----:|:------:|:-----------:|:----------:|:------:|:-----------:|
|
60 |
+
| Zamba2-1.2B-Instruct-v2 | 1.22 | 66.51 | 15.33 | 1.09 | 3.59 | 12.89 | 1.59 | 16.83 |
|
61 |
+
| Zamba2-1.2B-Instruct | 1.22 | 41.76 | 17.49 | 1.73 | 2.75 | 14.69 | 2.44 | 13.48 |
|
62 |
+
| Gemma-2-2b-it | 2.51 | 19.76 | 24.42 | 2.58 | 1.04 | 25.80 | 7.16 | 13.46 |
|
63 |
+
| SmolLM2-1.7B-Instruct | 1.71 | 53.00 | 18.30 | 3.51 | 4.89 | 20.51 | 4.53 | 17.46 |
|
64 |
+
| Qwen-2.5-1.5B-Instruct | 1.54 | 43.74 | 24.72 | 0.80 | 19.11 | 27.23 | 4.45 | 20.01 |
|
65 |
+
| Llama-3.2-1B-Instruct | 1.24 | 56.88 | 16.65 | 2.03 | 6.85 | 17.79 | 1.68 | 16.98 |
|
66 |
|
67 |
Due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
68 |
|