---
title: Linly-Talker
app_file: app_multi.py
sdk: gradio
sdk_version: 4.44.0
---
# Digital Human Intelligent Dialogue System - Linly-Talker โ 'Interactive Dialogue with Your Virtual Self'
Linly-Talker WebUI
[](https://github.com/Kedreamix/Linly-Talker)

[](https://colab.research.google.com/github/Kedreamix/Linly-Talker/blob/main/colab_webui.ipynb)
[](https://github.com/Kedreamix/Linly-Talker/blob/main/LICENSE)
[](https://huggingface.co/Kedreamix/Linly-Talker)
[**English**](./README.md) | [**ไธญๆ็ฎไฝ**](./README_zh.md)
**2023.12 Update** ๐
**Users can upload any images for the conversation**
**2024.01 Update** ๐๐
- **Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.**
- **The deployment invocation method for FastAPI has been updated.**
- **The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.**
- **Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.**
**2024.02 Update** ๐
- **Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.**
- **ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).**
- **I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.**
- **I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.**
**2024.04 Update** ๐
- **Updated the offline mode for Paddle TTS, excluding Edge TTS.**
- **Updated ER-NeRF as one of the choices for Avatar generation.**
- **Updated app_talk.py to allow for the free upload of voice and images/videos for generation without being based on a dialogue scenario.**
**2024.05 Update** ๐
- **Updated the beginner-friendly AutoDL deployment tutorial, and also updated the codewithgpu image, allowing for one-click experience and learning.**
- **Updated WebUI.py: Linly-Talker WebUI now supports multiple modules, multiple models, and multiple options**
**2024.06 Update** ๐
- **Integrated MuseTalk into Linly-Talker and updated the WebUI, enabling basic real-time conversation capabilities.**
- **The refined WebUI defaults to not loading the LLM model to reduce GPU memory usage. It directly responds with text to complete voiceovers. The enhanced WebUI features three main functions: personalized character generation, multi-turn intelligent dialogue with digital humans, and real-time MuseTalk conversations. These improvements reduce previous GPU memory redundancies and add more prompts to assist users effectively.**
**2024.08 Update** ๐
- **Updated CosyVoice to offer high-quality text-to-speech (TTS) functionality and voice cloning capabilities; also upgraded to Wav2Lipv2 to enhance overall performance.**
**2024.09 Update** ๐
- **Added Linly-Talker API documentation, providing detailed interface descriptions to help users access Linly-Talkerโs features via the API.**
---
|
PROMPT TEXT |
PROMPT SPEECH |
TARGET TEXT |
RESULT |
Pre-trained Voice |
ไธญๆๅฅณ ้ณ่ฒ๏ผ'ไธญๆๅฅณ', 'ไธญๆ็ท', 'ๆฅ่ฏญ็ท', '็ฒค่ฏญๅฅณ', '่ฑๆๅฅณ', '่ฑๆ็ท', '้ฉ่ฏญๅฅณ'๏ผ |
โ |
ไฝ ๅฅฝ๏ผๆๆฏ้ไน็ๆๅผ่ฏญ้ณๅคงๆจกๅ๏ผ่ฏท้ฎๆไปไนๅฏไปฅๅธฎๆจ็ๅ๏ผ |
[sft.webm](https://github.com/user-attachments/assets/a9f9c8c4-7137-4845-9adb-a93ac304131e)
|
3s Language Cloning |
ๅธๆไฝ ไปฅๅ่ฝๅคๅ็ๆฏๆ่ฟๅฅฝๅฆใ |
[zero_shot_prompt.webm](https://github.com/user-attachments/assets/1ef09db6-42e5-42d2-acc2-d44e70b147f9)
|
ๆถๅฐๅฅฝๅไป่ฟๆนๅฏๆฅ็็ๆฅ็คผ็ฉ๏ผ้ฃไปฝๆๅค็ๆๅไธๆทฑๆทฑ็็ฅ็ฆ่ฎฉๆๅฟไธญๅ
ๆปกไบ็่็ๅฟซไน๏ผ็ฌๅฎนๅฆ่ฑๅฟ่ฌ็ปฝๆพใ |
[zero_shot.webm](https://github.com/user-attachments/assets/ba46c58f-2e16-4440-b920-51ec288f09e6)
|
Cross-lingual Cloning |
ๅจ้ฃไนๅ๏ผๅฎๅ
จๆถ่ดญ้ฃๅฎถๅ
ฌๅธ๏ผๅ ๆญคไฟๆ็ฎก็ๅฑ็ไธ่ดๆง๏ผๅฉ็ไธๅณๅฐๅ ๅ
ฅๅฎถๆ็่ตไบงไฟๆไธ่ดใ่ฟๅฐฑๆฏๆไปฌๆๆถไธไนฐไธๅ
จ้จ็ๅๅ ใ |
[cross_lingual_prompt.webm](https://github.com/user-attachments/assets/378ae5e6-b52a-47b4-b0db-d84d1edd6e56)
|
< |en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that's coming into the family is a reason why sometimes we don't buy the whole thing.
|
[cross_lingual.webm](https://github.com/user-attachments/assets/b0162fc8-5738-4642-9fdd-b388a4965546)
|
### Coming Soon
Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.
## THG - Avatar
Detailed information about the usage and code implementation of digital human generation can be found in [THG - Building Intelligent Digital Humans](./TFG/README.md).
### SadTalker
Digital persona generation can utilize SadTalker (CVPR 2023). For detailed information, please visit [https://sadtalker.github.io](https://sadtalker.github.io).
Before usage, download the SadTalker model:
```bash
bash scripts/sadtalker_download_models.sh
```
[Baidu (็พๅบฆไบ็)](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: `linl`)
[Quark(ๅคธๅ
็ฝ็)](https://pan.quark.cn/s/f48f5e35796b)
> If downloading from Baidu Cloud, remember to place it in the `checkpoints` folder. The model downloaded from Baidu Cloud is named `sadtalker` by default, but it should be renamed to `checkpoints`.
### Wav2Lip
Digital persona generation can also utilize Wav2Lip (ACM 2020). For detailed information, refer to [https://github.com/Rudrabha/Wav2Lip](https://github.com/Rudrabha/Wav2Lip).
Before usage, download the Wav2Lip model:
| Model | Description | Link to the model |
| ---------------------------- | ----------------------------------------------------- | ------------------------------------------------------------ |
| Wav2Lip | Highly accurate lip-sync | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/Eb3LEzbfuKlJiR600lQWRxgBIY27JZg80f7V9jtMfbNDaQ?e=TBFBVW) |
| Wav2Lip + GAN | Slightly inferior lip-sync, but better visual quality | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW) |
| Expert Discriminator | Weights of the expert discriminator | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EQRvmiZg-HRAjvI6zqN9eTEBP74KefynCwPWVmF57l-AYA?e=ZRPHKP) |
| Visual Quality Discriminator | Weights of the visual disc trained in a GAN setup | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EQVqH88dTm1HjlK11eNba5gBbn15WMS0B0EZbDBttqrqkg?e=ic0ljo) |
### Wav2Lipv2
Inspired by the repository [https://github.com/primepake/wav2lip_288x288](https://github.com/primepake/wav2lip_288x288), Wav2Lipv2 uses a newly trained 288 model to achieve higher quality results.
Additionally, by employing YOLO for facial detection, the overall effect is improved. You can compare and test the results in Linly-Talker. The model has been updated, and the comparison is as follows:
| Wav2Lip | Wav2Lipv2 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
|