MeiGen-AI
/

MeiGen-MultiTalk

video generation

conversational video generation

talking human video generation

Model card Files Files and versions

norris commited on Jun 9

Commit

67f33b5

·

verified ·

1 Parent(s): cf27c2b

Update README.md

add method description

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ pipeline_tag: image-to-video
 ---
 <p align="center">
-  <img src="assets/logo2.jpeg" alt="MultiTalk" width="240"/>
 </p>
 # MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
@@ -56,6 +56,11 @@ This repository hosts the model weights for **MultiTalk**. For installation, usa
 ## Method
 <p align="left"><img src="assets/pipe.png" width="80%"></p>

 ---
 <p align="center">
+  <img src="assets/logo2.jpeg" alt="MultiTalk" width="300"/>
 </p>
 # MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
 ## Method
+We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce
+the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention
+map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity
+between the features of the given region of a person in the reference image and all the features of the whole video.
 <p align="left"><img src="assets/pipe.png" width="80%"></p>