JusperLee commited on
Commit
7f73961
ยท
verified ยท
1 Parent(s): 7a9a666

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -5
README.md CHANGED
@@ -1,10 +1,74 @@
1
  ---
2
  pipeline_tag: audio-to-audio
3
  tags:
4
- - model_hub_mixin
5
- - pytorch_model_hub_mixin
 
 
6
  ---
7
 
8
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
9
- - Library: https://github.com/JusperLee/Apollo
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  pipeline_tag: audio-to-audio
3
  tags:
4
+ - audio
5
+ license: apache-2.0
6
+ language:
7
+ - en
8
  ---
9
 
10
+ <h3 align="center">Apollo: Band-sequence Modeling for High-Quality Audio Restoration</h3>
11
+ <p align="center">
12
+ <strong>Mohan Xu<sup>*</sup>, Kai Li<sup>*</sup>, Guo Chen, Xiaolin Hu</strong><br>
13
+ <strong>Tsinghua University, Beijing, China</strong><br>
14
+ <strong><sup>*</sup>Equal contribution</strong><br>
15
+ <a href="https://arxiv.org/abs/2409.08514">๐Ÿ“œ ICLR 2025</a> | <a href="https://cslikai.cn/TIGER/">๐ŸŽถ Demo</a> | <a href="https://huggingface.co/datasets/JusperLee/EchoSet">๐Ÿค— Dataset</a>
16
+
17
+ <p align="center">
18
+ <img src="https://visitor-badge.laobi.icu/badge?page_id=JusperLee.TIGER" alt="่ฎฟๅฎข็ปŸ่ฎก" />
19
+ <img src="https://img.shields.io/github/stars/JusperLee/TIGER?style=social" alt="GitHub stars" />
20
+ <img alt="Static Badge" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" />
21
+ </p>
22
+
23
+ <p align="center">
24
+
25
+ > TIGER is a lightweight model for speech separation which effectively extracts key acoustic features through frequency band-split, multi-scale and full-frequency-frame modeling.
26
+
27
+ ## ๐Ÿ’ฅ News
28
+
29
+ - **[2025-01-23]** We release the code and pre-trained model of TIGER! ๐Ÿš€
30
+ - **[2025-01-23]** We release the TIGER model and the EchoSet dataset! ๐Ÿš€
31
+
32
+ ## ๐Ÿ“œ Abstract
33
+
34
+ In this paper, we propose a speech separation model with significantly reduced parameter size and computational cost: Time-Frequency Interleaved Gain Extraction and Reconstruction Network (TIGER). TIGER leverages prior knowledge to divide frequency bands and applies compression on frequency information. We employ a multi-scale selective attention (MSA) module to extract contextual features, while introducing a full-frequency-frame attention (F^3A) module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a novel dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at random proportions. Experimental results demonstrated that TIGER significantly outperformed state-of-the-art (SOTA) model TF-GridNet on the EchoSet dataset in both inference speed and separation quality, while reducing the number of parameters by 94.3% and the MACs by 95.3%. These results indicate that by utilizing frequency band-split and interleaved modeling structures, TIGER achieves a substantial reduction in parameters and computational costs while maintaining high performance. Notably, TIGER is the first speech separation model with fewer than 1 million parameters that achieves performance close to the SOTA model.
35
+
36
+
37
+ ## ๐Ÿš€ Quick Start
38
+
39
+ ### Test with Pre-trained Model
40
+
41
+ ```bash
42
+ # Test using speech
43
+ python inference_speech.py --audio_path test/mix.wav
44
+
45
+ # Test using DnR
46
+ python inference_dnr.py --audio_path test/test_mixture_466.wav
47
+ ```
48
+
49
+ ### Train with EchoSet
50
+
51
+ ```bash
52
+ python audio_train.py --conf_dir configs/tiger.yml
53
+ ```
54
+
55
+ ### Evaluate with EchoSet
56
+
57
+ ```bash
58
+ python audio_test.py --conf_dir configs/tiger.yml
59
+ ```
60
+
61
+ ## ๐Ÿ“– Citation
62
+
63
+ ```bibtex
64
+ @article{xu2024tiger,
65
+ title={TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation},
66
+ author={Xu, Mohan and Li, Kai and Chen, Guo and Hu, Xiaolin},
67
+ journal={arXiv preprint arXiv:2410.01469},
68
+ year={2024}
69
+ }
70
+ ```
71
+
72
+ ## ๐Ÿ“ง Contact
73
+
74
+ If you have any questions, please feel free to contact us via `tsinghua.kaili@gmail.com`.