Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Usage tips
The specific attention pattern can be controlled at training and test time using the perm_mask input.