Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The core idea is to predict latent representations of the full input data based on a
masked view of the input in a selfdistillation setup using a standard Transformer architecture.