Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The abstract from the paper is the following:
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.