Masked autoencoder for image recognition; today's reading.

image-recognition masked-autoencoder today-i-read transformers

  1. Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick, “Masked Autoencoders Are Scalable Vision Learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 16000–16009, doi: 10.1109/CVPR52688.2022.01553.

    tl;dr: Proposes a masked autoencoder (MAE) for pretraining a Vision Transformer (ViT) for the image recognition task. The masked autoencoder is trained for the reconstruction task, with an asymmetric design; encoder does not take masked patches as input, while the decoder does. For image recognition, the decoder is abandoned and the encoder fine-tuned. Best results: ViT-Huge model, experiments on ImageNet-1K. Ablations abound in the paper.

© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.