It's an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting.