Wav2Vec2, Hubert) and any pretrained autoregressive model as the decoder.