Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The sliding-window pattern allows NA's
receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike
Swin Transformer's Window Self Attention (WSA).