Alternatively, one can further fine-tune the entire model on a downstream dataset, similar to BERT.