Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
194 Bytes
For instance, in
Wav2Vec2ForPreTraining, the last two linear layers need to have the initialization of the regular PyTorch nn.Linear
but all the other ones should use an initialization as above.