File size: 148 Bytes
5fa1a76
1
It’s a mechanism to avoid having a huge positional encoding matrix (when the sequence length is very big) by factorizing it into smaller matrices.