Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
PVT inherits the advantages of both CNN and Transformer, making it a unified
backbone for various vision tasks without convolutions, where it can be used as a direct replacement for CNN backbones.