This model is very slow, and takes 8h to generate a minute long audio using the 5b top prior on a V100 GPU.