Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
All Longformer models employ the following logic for
global_attention_mask:
0: the token attends "locally",
1: the token attends "globally".