YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Hugging Face Model

A novel self-supervised speech representation learning model combining masked language modeling with self-distillation and online clustering techniques. Achieves SOTA performance on various speech processing tasks.

Table of Contents

Model Details

Developers

  • Alexander H. Liu, Heng-Jui Chang (MIT CSAIL)
  • Michael Auli, Wei-Ning Hsu (Meta AI)
  • James Glass (MIT CSAIL)

Model Type

Self-supervised speech representation learning (Wav2Vec2 architecture variant)

Key Features

  • Self-distillation with teacher-student framework
  • Dynamic online clustering
  • Contextualized masking strategy
  • Combined contrastive + diversity losses

Usage

Feature Extraction

from transformers import Wav2Vec2ForPreTraining, Wav2Vec2FeatureExtractor
import torch
import librosa

# Load model components
model = Wav2Vec2ForPreTraining.from_pretrained("MohammadJRanjbar/DinoSR")
processor = Wav2Vec2FeatureExtractor.from_pretrained("MohammadJRanjbar/DinoSR")

# Process audio
audio, sr = librosa.load("speech.wav", sr=16000)
inputs = processor(audio, return_tensors="pt", sampling_rate=16000)

# Extract representations
with torch.no_grad():
    outputs = model(**inputs)
    
speech_features = outputs.projected_states  # [batch_size, seq_len, 256]

Fine-tuning for ASR

from transformers import Wav2Vec2ForCTC

model = Wav2Vec2ForCTC.from_pretrained(
    "MohammadJRanjbar/DinoSR",
    attention_dropout=0.1,
    hidden_dropout=0.1,
    layerdrop=0.1,
    ctc_loss_reduction="mean"
)

# Freeze feature encoder
model.freeze_feature_encoder()

Citation

@article{liu2023dinosr,
  title={DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning},
  author={Liu, Alexander H and Chang, Heng-Jui and Auli, Michael and Hsu, Wei-Ning and Glass, James},
  journal={arXiv preprint arXiv:2305.10005},
  year={2023}
}

Additional Information

Resources

Contact

For questions and feedback:

This model card was generated using best practices from Model Card Creator

Downloads last month
31
Safetensors
Model size
95.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support