It should also add the speaker embeddings as an additional input.