Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
But GPT-2 lacked the bidirectional context from BERT's pretraining, which made it unsuitable for certain tasks.