Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The model has to predict the true quantized speech representation of the masked prediction from a set of false ones, encouraging the model to find the most similar context vector and quantized speech unit (the target label).