Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
To leverage the inductive
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
distillation and cosine-distance losses.