|
It is a broad category with many specific applications, some of which include: |
|
|
|
acoustic scene classification: label audio with a scene label ("office", "beach", "stadium") |
|
acoustic event detection: label audio with a sound event label ("car horn", "whale calling", "glass breaking") |
|
tagging: label audio containing multiple sounds (birdsongs, speaker identification in a meeting) |
|
music classification: label music with a genre label ("metal", "hip-hop", "country") |
|
|
|
from transformers import pipeline |
|
classifier = pipeline(task="audio-classification", model="superb/hubert-base-superb-er") |
|
preds = classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac") |
|
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds] |
|
preds |
|
[{'score': 0.4532, 'label': 'hap'}, |
|
{'score': 0.3622, 'label': 'sad'}, |
|
{'score': 0.0943, 'label': 'neu'}, |
|
{'score': 0.0903, 'label': 'ang'}] |
|
|
|
Automatic speech recognition |
|
Automatic speech recognition (ASR) transcribes speech into text. |