Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The abstract from the paper is the following:
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length.