5fa1a76
1
2
Following BERT developed in the natural language processing area, we propose a masked image modeling task to pretrain vision Transformers.