Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance.