Update README.md
Browse filesadd method description
README.md
CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: image-to-video
|
|
11 |
---
|
12 |
|
13 |
<p align="center">
|
14 |
-
<img src="assets/logo2.jpeg" alt="MultiTalk" width="
|
15 |
</p>
|
16 |
|
17 |
# MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
|
@@ -56,6 +56,11 @@ This repository hosts the model weights for **MultiTalk**. For installation, usa
|
|
56 |
|
57 |
|
58 |
## Method
|
|
|
|
|
|
|
|
|
|
|
59 |
<p align="left"><img src="assets/pipe.png" width="80%"></p>
|
60 |
|
61 |
|
|
|
11 |
---
|
12 |
|
13 |
<p align="center">
|
14 |
+
<img src="assets/logo2.jpeg" alt="MultiTalk" width="300"/>
|
15 |
</p>
|
16 |
|
17 |
# MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
|
|
|
56 |
|
57 |
|
58 |
## Method
|
59 |
+
We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce
|
60 |
+
the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention
|
61 |
+
map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity
|
62 |
+
between the features of the given region of a person in the reference image and all the features of the whole video.
|
63 |
+
|
64 |
<p align="left"><img src="assets/pipe.png" width="80%"></p>
|
65 |
|
66 |
|