Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
wav2vec 2.0 masks
the speech input in the latent space and solves a contrastive task defined over a quantization of the latent
representations which are jointly learned.