For Pix2Struct models, we have found out that fine-tuning the model with Adafactor and cosine learning rate scheduler leads to faste convergence: | |
thon | |
from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup | |
optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=0.01, weight_decay=1e-05) | |
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=1000, num_training_steps=40000) | |
MatCha is a model that is trained using Pix2Struct architecture. |