Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The input embeddings and masked spans are passed through the encoder to output some final hidden states, but unlike BERT, BART doesn't add a final feedforward network at the end to predict a word.