During fine-tuning, each layer's relative position | |
bias is initialized with the shared relative position bias obtained after pre-training. |
During fine-tuning, each layer's relative position | |
bias is initialized with the shared relative position bias obtained after pre-training. |