Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Create a feature extractor to handle the audio inputs:
from transformers import Wav2Vec2FeatureExtractor
feature_extractor = Wav2Vec2FeatureExtractor(padding_value=1.0, do_normalize=True)
Create a tokenizer to handle the text inputs:
from transformers import Wav2Vec2CTCTokenizer
tokenizer = Wav2Vec2CTCTokenizer(vocab_file="my_vocab_file.txt")
Combine the feature extractor and tokenizer in [Wav2Vec2Processor]:
from transformers import Wav2Vec2Processor
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)
With two basic classes - configuration and model - and an additional preprocessing class (tokenizer, image processor, feature extractor, or processor), you can create any of the models supported by 🤗 Transformers.