
fnlp/SmolLM-135M-MLA-d_kv_8-refactor
Text Generation
•
0.1B
•
Updated
•
8
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"