stijn-zyphra commited on
Commit
bd256dc
·
verified ·
1 Parent(s): ef0f6f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -55,14 +55,14 @@ print((tokenizer.decode(outputs[0])))
55
  Zamba2-1.2B-Instruct-v2 achieves leading instruction-following performance for a model of its size and surpasses models of significantly larger size. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
56
 
57
 
58
- | Model | Size | IFEval | BBH | GPQA | MATH_hard | MMLU_pro | MUSR | Aggregate |
59
  |:-------|:------:|:--------:|:-----:|:------:|:-----------:|:----------:|:------:|:-----------:|
60
- | Zamba2-1.2B-Instruct-v2 | 1.22B | 66.51 | 15.33 | 1.09 | 3.59 | 12.89 | 1.59 | 16.83 |
61
- | Zamba2-1.2B-Instruct | 1.22B | 41.76 | 17.49 | 1.73 | 2.75 | 14.69 | 2.44 | 13.48 |
62
- | Gemma-2-2b-it | 2.51B | 19.76 | 24.42 | 2.58 | 1.04 | 25.80 | 7.16 | 13.46 |
63
- | SmolLM2-1.7B-Instruct | 1.71B | 53.00 | 18.30 | 3.51 | 4.89 | 20.51 | 4.53 | 17.46 |
64
- | Qwen-2.5-1.5B-Instruct | 1.54B | 43.74 | 24.72 | 0.80 | 19.11 | 27.23 | 4.45 | 20.01 |
65
- | Llama-3.2-1B-Instruct | 1.24B | 56.88 | 16.65 | 2.03 | 6.85 | 17.79 | 1.68 | 16.98 |
66
 
67
  Due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
68
 
 
55
  Zamba2-1.2B-Instruct-v2 achieves leading instruction-following performance for a model of its size and surpasses models of significantly larger size. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
56
 
57
 
58
+ | Model | Size (B) | IFEval | BBH | GPQA | MATH (Hard) | MMLU Pro | MUSR | Aggregate |
59
  |:-------|:------:|:--------:|:-----:|:------:|:-----------:|:----------:|:------:|:-----------:|
60
+ | Zamba2-1.2B-Instruct-v2 | 1.22 | 66.51 | 15.33 | 1.09 | 3.59 | 12.89 | 1.59 | 16.83 |
61
+ | Zamba2-1.2B-Instruct | 1.22 | 41.76 | 17.49 | 1.73 | 2.75 | 14.69 | 2.44 | 13.48 |
62
+ | Gemma-2-2b-it | 2.51 | 19.76 | 24.42 | 2.58 | 1.04 | 25.80 | 7.16 | 13.46 |
63
+ | SmolLM2-1.7B-Instruct | 1.71 | 53.00 | 18.30 | 3.51 | 4.89 | 20.51 | 4.53 | 17.46 |
64
+ | Qwen-2.5-1.5B-Instruct | 1.54 | 43.74 | 24.72 | 0.80 | 19.11 | 27.23 | 4.45 | 20.01 |
65
+ | Llama-3.2-1B-Instruct | 1.24 | 56.88 | 16.65 | 2.03 | 6.85 | 17.79 | 1.68 | 16.98 |
66
 
67
  Due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
68