nvidia
/

audio_to_audio_schrodinger_bridge

Model card Files Files and versions Community

audio_to_audio_schrodinger_bridge / README.md

ZhifengKong's picture

initial commit

bd45601 25 days ago

|

history blame contribute delete

1.8 kB

	---
	license: other
	license_name: nvidia-oneway-noncommercial-license
	---

	# PyTorch Implementation of Audio-to-Audio Schrodinger Bridges

	Zhifeng Kong, Kevin J Shih, Weili Nie, Arash Vahdat, Sang-gil Lee, Joao Felipe Santos, Ante Jukic, Rafael Valle, Bryan Catanzaro

	[[paper]](https://arxiv.org/abs/2501.11311) [[GitHub]](https://github.com/NVIDIA/diffusion-audio-restoration) [[Demo]](https://research.nvidia.com/labs/adlr/A2SB/)

	This repo contains the PyTorch implementation of [A2SB: Audio-to-Audio Schrodinger Bridges](https://arxiv.org/abs/2501.11311). A2SB is an audio restoration model tailored for high-res music at 44.1kHz. It is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end without need of a vocoder to predict waveform outputs, and able to restore hour-long audio inputs. A2SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets.

	- We propose A2SB, a state-of-the-art, end-to-end, vocoder-free, and multi-task diffusion Schrodinger Bridge model for 44.1kHz high-res music restoration, using an effective factorized audio representation.

	- A2SB is the first long audio restoration model that could restore hour-long audio without
	boundary artifacts

	## License

	The model is provided under the NVIDIA OneWay NonCommercial License.


	## Citation

	```
	@article{kong2025a2sb,
	title={A2SB: Audio-to-Audio Schrodinger Bridges},
	author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
	journal={arXiv preprint arXiv:2501.11311},
	year={2025}
	}
	```