Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Implement those changes which often means changing the self-attention layer, the order of the normalization
layer, etc… Again, it is often useful to look at the similar architecture of already existing models in Transformers to
get a better feeling of how your model should be implemented.