are pretrained transformer models initially trained to predict the next token given some input text.