Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders.