Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
346 Bytes
Different from ViT that typically yields low resolution outputs and
incurs high computational and memory costs, PVT not only can be trained on dense partitions of an image to achieve high
output resolution, which is important for dense prediction, but also uses a progressive shrinking pyramid to reduce the
computations of large feature maps.