Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs Paper • 2503.16870 • Published Mar 21 • 5
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs Paper • 2503.16870 • Published Mar 21 • 5 • 2
Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation Paper • 2012.14681 • Published Dec 29, 2020
Infusing Future Information into Monotonic Attention Through Language Models Paper • 2109.03121 • Published Sep 7, 2021
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems Paper • 2110.15729 • Published Oct 13, 2021
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering Paper • 2211.10147 • Published Nov 18, 2022
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models Paper • 2403.09635 • Published Mar 14, 2024 • 1
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models Paper • 2403.09635 • Published Mar 14, 2024 • 1