Running 266 266 Qwen2.5 Omni 7B Demo 🏆 Generate text and speech responses from text, images, or audio input
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 27 days ago • 90
Text-aware and Context-aware Expressive Audiobook Speech Synthesis Paper • 2406.05672 • Published Jun 9, 2024
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark Paper • 2406.05763 • Published Jun 9, 2024
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS Paper • 2309.13907 • Published Sep 25, 2023
Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies Paper • 2312.09746 • Published Dec 15, 2023