Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Because T5 has been trained with the span-mask denoising objective,
it can be used to predict the sentinel (masked-out) tokens during inference.