Implementation details The main differences compared to GPT2.