---
license: mit
language: fr
datasets:
- Cnam-LMSSC/vibravox
tags:
- audio
- audio-to-audio
- speech
---
# Master Model Card: Vibravox Audio Bandwidth extension Models

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:280px; height:280px;" >
</p>

## Overview

This master model card serves as an entry point for exploring [multiple **audio bandwidth extension** (BWE) models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models#available-models) trained on different sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox). 

These models are designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.

The models are trained on specific sensors to address various audio capture scenarios using **body conducted** sound and vibration sensors.

## Disclaimer
Each of these models has been trained for **specific non-conventional speech sensors** and is intended to be used with **in-domain data**.

Please be advised that using these models outside their intended sensor data may result in suboptimal performance.

## Usage
All models are trained using [Configurable EBEN](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008)) and adapted to different sensor inputs. They are intended to be used at a sample rate of 16kHz. 

## Training Procedure
Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository and in the [VibraVox paper on arXiV](https://arxiv.org/abs/2407.11828).

## Available Models

The following models are available, **each trained on a different sensor** on the `speech_clean` or synthetically mixed `speech_clean` and `speechless-noisy` subsets of (https://huggingface.co/datasets/Cnam-LMSSC/vibravox):

| **Transducer**                                            | **EBEN configuration** | **Huggingface model trained on *speech&#45;clean* link**  | **Huggingface model trained on synthetically mixed *speech&#45;clean* and *speechless&#45;noisy* link** | 
|:---------------------------|:---------------------|:---------------------|:---------------------|
| In-ear&nbsp;comply&nbsp;foam&#45;embedded&nbsp;microphone | M=4,P=2,Q=4 |[EBEN_soft_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_soft_in_ear_microphone) |[EBEN_noisy_soft_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_noisy_soft_in_ear_microphone)|
| In-ear&nbsp;rigid&nbsp;earpiece&#45;embedded&nbsp;microphone | M=4,P=2,Q=4 |[EBEN_rigid_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_rigid_in_ear_microphone) | [EBEN_noisy_rigid_in_ear_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_noisy_rigid_in_ear_microphone)| 
| Forehead&nbsp;miniature&nbsp;vibration&nbsp;sensor | M=4,P=4,Q=4 |[EBEN_forehead_accelerometer](https://huggingface.co/Cnam-LMSSC/EBEN_forehead_accelerometer) | [EBEN_noisy_forehead_accelerometer](https://huggingface.co/Cnam-LMSSC/EBEN_noisy_forehead_accelerometer)| 
| Temple&nbsp;vibration&nbsp;pickup | M=4,P=1,Q=4 |[EBEN_temple_vibration_pickup](https://huggingface.co/Cnam-LMSSC/EBEN_temple_vibration_pickup) | [EBEN_noisy_temple_vibration_pickup](https://huggingface.co/Cnam-LMSSC/EBEN_noisy_temple_vibration_pickup)| 
| Laryngophone | M=4,P=2,Q=4 |[EBEN_throat_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_throat_microphone) | [EBEN_noisy_throat_microphone](https://huggingface.co/Cnam-LMSSC/EBEN_noisy_throat_microphone)|