One can use [T5ForConditionalGeneration] (or the Tensorflow/Flax variant), which includes the language modeling head on top of the decoder.