Each speech candidate is passed through the speech encoder ([ClvpEncoder]) which converts them into a vector representation, and the text encoder ([ClvpEncoder]) converts the text tokens into the same latent space. |
Each speech candidate is passed through the speech encoder ([ClvpEncoder]) which converts them into a vector representation, and the text encoder ([ClvpEncoder]) converts the text tokens into the same latent space. |