Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
515 Bytes
For Pix2Struct models, we have found out that fine-tuning the model with Adafactor and cosine learning rate scheduler leads to faste convergence:
thon
from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup
optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=0.01, weight_decay=1e-05)
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=1000, num_training_steps=40000)
MatCha is a model that is trained using Pix2Struct architecture.