Generate text and speech from audio, video, and text input
Conversational speech generation
Compare two audio samples to identify same speakers