Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
global_attention_mask at run-time appropriately.