norris commited on
Commit
67f33b5
·
verified ·
1 Parent(s): cf27c2b

Update README.md

Browse files

add method description

Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: image-to-video
11
  ---
12
 
13
  <p align="center">
14
- <img src="assets/logo2.jpeg" alt="MultiTalk" width="240"/>
15
  </p>
16
 
17
  # MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
@@ -56,6 +56,11 @@ This repository hosts the model weights for **MultiTalk**. For installation, usa
56
 
57
 
58
  ## Method
 
 
 
 
 
59
  <p align="left"><img src="assets/pipe.png" width="80%"></p>
60
 
61
 
 
11
  ---
12
 
13
  <p align="center">
14
+ <img src="assets/logo2.jpeg" alt="MultiTalk" width="300"/>
15
  </p>
16
 
17
  # MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
 
56
 
57
 
58
  ## Method
59
+ We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce
60
+ the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention
61
+ map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity
62
+ between the features of the given region of a person in the reference image and all the features of the whole video.
63
+
64
  <p align="left"><img src="assets/pipe.png" width="80%"></p>
65
 
66