File size: 291 Bytes
5fa1a76 |
1 2 3 |
Implement those changes which often means changing the self-attention layer, the order of the normalization layer, etc… Again, it is often useful to look at the similar architecture of already existing models in Transformers to get a better feeling of how your model should be implemented. |