AI & ML interests
LLM
Recent Activity
View all activity
Open source weights of Lorsa modules introduced in "Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition".
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Paper • 2502.14837 • Published • 4 -
fnlp/Llama-2-7B-MLA-d_kv_16
Text Generation • 6B • Updated • 29 -
fnlp/Llama-2-7B-MLA-d_kv_32
Text Generation • 6B • Updated • 4 -
fnlp/Llama-2-7B-MLA-d_kv_64
Text Generation • 7B • Updated • 5
-
fnlp/Embodied_R1-ScienceWorld
8B • Updated • 1 -
fnlp/Embodied_Planner-R1-Alfworld
8B • Updated • 2 -
Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
Paper • 2506.23127 • Published • 1 -
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Paper • 2503.10480 • Published • 54
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
fnlp/SmolLM-135M-MLA-d_kv_8-refactor
Text Generation • 0.1B • Updated • 8 -
fnlp/SmolLM-135M-MLA-d_kv_32-refactor
Text Generation • 0.1B • Updated • 5 -
fnlp/SmolLM-135M-MLA-d_kv_16-refactor
Text Generation • 0.1B • Updated -
fnlp/SmolLM-360M-MLA-d_kv_8-refactor
Text Generation • 0.3B • Updated • 1
-
fnlp/Embodied_R1-ScienceWorld
8B • Updated • 1 -
fnlp/Embodied_Planner-R1-Alfworld
8B • Updated • 2 -
Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
Paper • 2506.23127 • Published • 1 -
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Paper • 2503.10480 • Published • 54
Open source weights of Lorsa modules introduced in "Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition".
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
fnlp/SmolLM-135M-MLA-d_kv_8-refactor
Text Generation • 0.1B • Updated • 8 -
fnlp/SmolLM-135M-MLA-d_kv_32-refactor
Text Generation • 0.1B • Updated • 5 -
fnlp/SmolLM-135M-MLA-d_kv_16-refactor
Text Generation • 0.1B • Updated -
fnlp/SmolLM-360M-MLA-d_kv_8-refactor
Text Generation • 0.3B • Updated • 1
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Paper • 2502.14837 • Published • 4 -
fnlp/Llama-2-7B-MLA-d_kv_16
Text Generation • 6B • Updated • 29 -
fnlp/Llama-2-7B-MLA-d_kv_32
Text Generation • 6B • Updated • 4 -
fnlp/Llama-2-7B-MLA-d_kv_64
Text Generation • 7B • Updated • 5