Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
CLVP can be used to compare different generated speech candidates with the provided text, and the best speech tokens are forwarded to the diffusion model.