Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_xls_r.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

2.01 kB


	XLS-R
	Overview
	The XLS-R model was proposed in XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman
	Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
	The abstract from the paper is the following:
	This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
	We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128
	languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range
	of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation
	benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into
	English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as
	VoxPopuli, lowering error rates by 14-34% relative on average. XLS-R also sets a new state of the art on VoxLingua107
	language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform
	English-only pretraining when translating English speech into other languages, a setting which favors monolingual
	pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world.
	Relevant checkpoints can be found under https://huggingface.co/models?other=xls_r.
	The original code can be found here.
	Usage tips

	XLS-R is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
	XLS-R model was trained using connectionist temporal classification (CTC) so the model output has to be decoded using
	[Wav2Vec2CTCTokenizer].

	XLS-R's architecture is based on the Wav2Vec2 model, refer to Wav2Vec2's documentation page for API reference.