File size: 2,219 Bytes
57bdca5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
UniSpeech Overview The UniSpeech model was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang . The abstract from the paper is the following: In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve the generalization across languages and domains. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus. The results show that UniSpeech outperforms self-supervised pretraining and supervised transfer learning for speech recognition by a maximum of 13.4% and 17.8% relative phone error rate reductions respectively (averaged over all testing languages). The transferability of UniSpeech is also demonstrated on a domain-shift speech recognition task, i.e., a relative word error rate reduction of 6% against the previous approach. This model was contributed by patrickvonplaten. The Authors' code can be found here. Usage tips UniSpeech is a speech model that accepts a float array corresponding to the raw waveform of the speech signal. Please use [Wav2Vec2Processor] for the feature extraction. UniSpeech model can be fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded using [Wav2Vec2CTCTokenizer]. Resources Audio classification task guide Automatic speech recognition task guide UniSpeechConfig [[autodoc]] UniSpeechConfig UniSpeech specific outputs [[autodoc]] models.unispeech.modeling_unispeech.UniSpeechForPreTrainingOutput UniSpeechModel [[autodoc]] UniSpeechModel - forward UniSpeechForCTC [[autodoc]] UniSpeechForCTC - forward UniSpeechForSequenceClassification [[autodoc]] UniSpeechForSequenceClassification - forward UniSpeechForPreTraining [[autodoc]] UniSpeechForPreTraining - forward |