Commit
·
836dd4d
1
Parent(s):
411991f
update arxiv url
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
- muddformer
|
8 |
license: mit
|
9 |
---
|
10 |
-
In comparison with Pythia-1.4B, MUDDPythia-1.4B is a pretrained language model on the Pile with 300B tokens, which uses a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. Please see downstrem evaluations and more details in the paper[(MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections)](https://arxiv.org). In addition, we open-source Jax training code on [(Github)](https://github.com/Caiyun-AI/MUDDFormer/).
|
11 |
|
12 |
We recommend <strong>compiled version</strong> of MUDDPythia with *torch.compile* for inference acceleration. Please refer to Generation section for compile implementation.
|
13 |
|
|
|
7 |
- muddformer
|
8 |
license: mit
|
9 |
---
|
10 |
+
In comparison with Pythia-1.4B, MUDDPythia-1.4B is a pretrained language model on the Pile with 300B tokens, which uses a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. Please see downstrem evaluations and more details in the paper[(MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections)](https://arxiv.org/abs/2502.12170). In addition, we open-source Jax training code on [(Github)](https://github.com/Caiyun-AI/MUDDFormer/).
|
11 |
|
12 |
We recommend <strong>compiled version</strong> of MUDDPythia with *torch.compile* for inference acceleration. Please refer to Generation section for compile implementation.
|
13 |
|