A selected few tokens attend "globally" to all other tokens, as it is | |
conventionally done for all tokens in BertSelfAttention. |
A selected few tokens attend "globally" to all other tokens, as it is | |
conventionally done for all tokens in BertSelfAttention. |