Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It permutes the tokens in the sentence, then allows the model to use the last n tokens to predict the token n+1.