Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: model.compile(optimizer=optimizer) # No loss argument!