Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
During fine-tuning, each layer's relative position
bias is initialized with the shared relative position bias obtained after pre-training.