Hilbertmeng commited on
Commit
836dd4d
·
1 Parent(s): 411991f

update arxiv url

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - muddformer
8
  license: mit
9
  ---
10
- In comparison with Pythia-1.4B, MUDDPythia-1.4B is a pretrained language model on the Pile with 300B tokens, which uses a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. Please see downstrem evaluations and more details in the paper[(MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections)](https://arxiv.org). In addition, we open-source Jax training code on [(Github)](https://github.com/Caiyun-AI/MUDDFormer/).
11
 
12
  We recommend <strong>compiled version</strong> of MUDDPythia with *torch.compile* for inference acceleration. Please refer to Generation section for compile implementation.
13
 
 
7
  - muddformer
8
  license: mit
9
  ---
10
+ In comparison with Pythia-1.4B, MUDDPythia-1.4B is a pretrained language model on the Pile with 300B tokens, which uses a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. Please see downstrem evaluations and more details in the paper[(MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections)](https://arxiv.org/abs/2502.12170). In addition, we open-source Jax training code on [(Github)](https://github.com/Caiyun-AI/MUDDFormer/).
11
 
12
  We recommend <strong>compiled version</strong> of MUDDPythia with *torch.compile* for inference acceleration. Please refer to Generation section for compile implementation.
13