Efficient Content-Based Sparse Attention with Routing Transformers Paper • 2003.05997 • Published Mar 12, 2020
Scaling Local Self-Attention for Parameter Efficient Visual Backbones Paper • 2103.12731 • Published Mar 23, 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers Paper • 2109.10686 • Published Sep 22, 2021