--- license: mit base_model: - meta-llama/Llama-3.1-8B --- The model is derived from Llama-3.1-8B through pruning using LLM-Streamline **(Streamlining Redundant Layers to Compress Large Language Models, ICLR 2025 Spotlight)**. The entire training process required only 1.3B tokens. Below are the results of the evaluation using lm-eval: | | arc_c | arc_e | boolq | hellaswag | openbookqa | rte | winogrande | Avg | |----------------|-------|-------|-------|-----------|------------|------|------------|------| | Llama-3.1-8B | 50.4 | 80.3 | 81.2 | 60.2 | 34.8 | 67.9 | 73.0 | 64.0 | | Llama-3.1-5.4B | 42.1 | 72.2 | 78.0 | 54.3 | 27.2 | 62.8 | 71.0 | 58.2 |