YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
A novel self-supervised speech representation learning model combining masked language modeling with self-distillation and online clustering techniques. Achieves SOTA performance on various speech processing tasks.
Table of Contents
Model Details
Developers
- Alexander H. Liu, Heng-Jui Chang (MIT CSAIL)
- Michael Auli, Wei-Ning Hsu (Meta AI)
- James Glass (MIT CSAIL)
Model Type
Self-supervised speech representation learning (Wav2Vec2 architecture variant)
Key Features
- Self-distillation with teacher-student framework
- Dynamic online clustering
- Contextualized masking strategy
- Combined contrastive + diversity losses
Usage
Feature Extraction
from transformers import Wav2Vec2ForPreTraining, Wav2Vec2FeatureExtractor
import torch
import librosa
# Load model components
model = Wav2Vec2ForPreTraining.from_pretrained("MohammadJRanjbar/DinoSR")
processor = Wav2Vec2FeatureExtractor.from_pretrained("MohammadJRanjbar/DinoSR")
# Process audio
audio, sr = librosa.load("speech.wav", sr=16000)
inputs = processor(audio, return_tensors="pt", sampling_rate=16000)
# Extract representations
with torch.no_grad():
outputs = model(**inputs)
speech_features = outputs.projected_states # [batch_size, seq_len, 256]
Fine-tuning for ASR
from transformers import Wav2Vec2ForCTC
model = Wav2Vec2ForCTC.from_pretrained(
"MohammadJRanjbar/DinoSR",
attention_dropout=0.1,
hidden_dropout=0.1,
layerdrop=0.1,
ctc_loss_reduction="mean"
)
# Freeze feature encoder
model.freeze_feature_encoder()
Citation
@article{liu2023dinosr,
title={DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning},
author={Liu, Alexander H and Chang, Heng-Jui and Auli, Michael and Hsu, Wei-Ning and Glass, James},
journal={arXiv preprint arXiv:2305.10005},
year={2023}
}
Additional Information
Resources
- Original Paper
- GitHub Repository
- Hugging Face Documentation
- This model was converted from Fairseq to Hugging Face using convert.py script. For the original model, check the GitHub repository.
Contact
For questions and feedback:
- Alexander H. Liu: alexhliu@mit.edu
- Model maintainer: MohammadJRanjbar
This model card was generated using best practices from Model Card Creator
- Downloads last month
- 31
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.