Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This solves the issue of bidirectionality, where the model could cheat and see all the words and "predict" the next word.