Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Unlike previous state-of-the-art methods, our efficient formulation of self-attention enables its usage at all stages of the network.