Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text.