Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: import tensorflow as tf model.compile(optimizer=optimizer) # No loss argument!