Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers.