Update README.md
Browse files
README.md
CHANGED
@@ -2,11 +2,72 @@
|
|
2 |
license: openrail++
|
3 |
library_name: diffusers
|
4 |
tags:
|
5 |
-
-
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
# The checkpoints of LatentSync 1.5
|
9 |
-
|
10 |
Paper: https://arxiv.org/abs/2412.09262
|
11 |
|
12 |
-
Code: https://github.com/bytedance/LatentSync
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: openrail++
|
3 |
library_name: diffusers
|
4 |
tags:
|
5 |
+
- lipsync
|
6 |
+
- video editing
|
7 |
+
pipeline_tag: video-to-video
|
8 |
---
|
9 |
|
|
|
|
|
10 |
Paper: https://arxiv.org/abs/2412.09262
|
11 |
|
12 |
+
Code: https://github.com/bytedance/LatentSync
|
13 |
+
|
14 |
+
# What's new in LatentSync 1.5?
|
15 |
+
|
16 |
+
1. Add temporal layer: Our previous claim that the [temporal layer](https://arxiv.org/abs/2307.04725) severely impairs lip-sync accuracy was incorrect; the issue was actually caused by a bug in the code implementation. We have corrected our [paper](https://arxiv.org/abs/2412.09262) and updated the code. After incorporating the temporal layer, LatentSync 1.5 demonstrates significantly improved temporal consistency compared to version 1.0.
|
17 |
+
|
18 |
+
2. Improves performance on Chinese videos: many issues reported poor performance on Chinese videos, so we added Chinese data to the training of the new model version.
|
19 |
+
|
20 |
+
3. Reduce the VRAM requirement of the stage2 training to **20 GB** through the following optimizations:
|
21 |
+
|
22 |
+
1. Implement gradient checkpointing in U-Net, VAE, SyncNet and VideoMAE
|
23 |
+
2. Replace xFormers with PyTorch's native implementation of FlashAttention-2.
|
24 |
+
3. Clear the CUDA cache after loading checkpoints.
|
25 |
+
4. The stage2 training only requires training the temporal layer and audio cross-attention layer, which significantly reduces VRAM requirement compared to the previous full-parameter fine-tuning.
|
26 |
+
|
27 |
+
Now you can train LatentSync on a single **RTX 3090**! Start the stage2 training with `configs/unet/stage2_efficient.yaml`.
|
28 |
+
|
29 |
+
4. Other code optimizations:
|
30 |
+
|
31 |
+
1. Remove the dependency on xFormers and Triton.
|
32 |
+
2. Upgrade the diffusers version to `0.32.2`.
|
33 |
+
|
34 |
+
## LatentSync 1.5 Demo
|
35 |
+
|
36 |
+
<table class="center">
|
37 |
+
<tr style="font-weight: bolder;text-align:center;">
|
38 |
+
<td width="50%"><b>Original video</b></td>
|
39 |
+
<td width="50%"><b>Lip-synced video</b></td>
|
40 |
+
</tr>
|
41 |
+
<tr>
|
42 |
+
<td>
|
43 |
+
<video src=https://github.com/user-attachments/assets/b0c8d1da-3fdc-4946-9800-1b2fd0ef9c7f controls preload></video>
|
44 |
+
</td>
|
45 |
+
<td>
|
46 |
+
<video src=https://github.com/user-attachments/assets/25dd1733-44c7-42fe-805a-d612d4bc30e0 controls preload></video>
|
47 |
+
</td>
|
48 |
+
</tr>
|
49 |
+
<tr>
|
50 |
+
<td>
|
51 |
+
<video src=https://github.com/user-attachments/assets/4e48e501-64b4-4b4f-a69c-ed18dd987b1f controls preload></video>
|
52 |
+
</td>
|
53 |
+
<td>
|
54 |
+
<video src=https://github.com/user-attachments/assets/e690d91b-9fe5-4323-a60e-2b7f546f01bc controls preload></video>
|
55 |
+
</td>
|
56 |
+
</tr>
|
57 |
+
<tr>
|
58 |
+
<td>
|
59 |
+
<video src=https://github.com/user-attachments/assets/e84e2c13-1deb-41f7-8382-048ba1922b71 controls preload></video>
|
60 |
+
</td>
|
61 |
+
<td>
|
62 |
+
<video src=https://github.com/user-attachments/assets/5a5ba09f-590b-4eb3-8dfb-a199d8d1e276 controls preload></video>
|
63 |
+
</td>
|
64 |
+
</tr>
|
65 |
+
<tr>
|
66 |
+
<td>
|
67 |
+
<video src=https://github.com/user-attachments/assets/11e4b2b6-64f4-4617-b005-059209fcaea5 controls preload></video>
|
68 |
+
</td>
|
69 |
+
<td>
|
70 |
+
<video src=https://github.com/user-attachments/assets/38437475-3c90-4d08-b540-c8e819e93e0d controls preload></video>
|
71 |
+
</td>
|
72 |
+
</tr>
|
73 |
+
</table>
|