We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. |
We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. |