The quality of the speaker embeddings appears to be a significant factor.