Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It is a broad category with many specific applications, some of which include:
acoustic scene classification: label audio with a scene label ("office", "beach", "stadium")
acoustic event detection: label audio with a sound event label ("car horn", "whale calling", "glass breaking")
tagging: label audio containing multiple sounds (birdsongs, speaker identification in a meeting)
music classification: label music with a genre label ("metal", "hip-hop", "country")
from transformers import pipeline
classifier = pipeline(task="audio-classification", model="superb/hubert-base-superb-er")
preds = classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds
[{'score': 0.4532, 'label': 'hap'},
{'score': 0.3622, 'label': 'sad'},
{'score': 0.0943, 'label': 'neu'},
{'score': 0.0903, 'label': 'ang'}]
Automatic speech recognition
Automatic speech recognition (ASR) transcribes speech into text.