CLMAE: A Liter and Faster Masked Autoencoders

  • Yiran Song
  • , Lizhuang Ma*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Self-supervised pre-training has been widely utilized on various vision tasks and gains a great success. However, pre-training on big datasets suffers a lengthy training schedule and large memory consumption. To alleviate these problems, we propose a light-weighted model called Convolutional Lite Masked AutoEncoder (CLMAE). To improve the convergence speed of the transformer during pre-training. We introduce two-stage convolutional progressive patch embedding and an additional convolution in the feed-forward layer, which promote better correlation among patches in the spatial dimensions. The most important design is called cross-layer parameter sharing mechanism, which reduces model parameters with little impact on the performance. We find that sharing parameters among layers not only improves the parameter efficiency, but also acts as a form of regularization that stabilizes the training. Experimental results on downstream tasks show the effectiveness and generalization ability of CLMAE, which accelerates the training process significantly (by 5× for ViT-B and MAE) and reduces a quarter of parameters (by 25M fewer for ViT-B), with a competitive accuracy (82.8% on ImageNet-1K).

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Externally publishedYes
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Keywords

  • Pre-training
  • masked autoencoders

Fingerprint

Dive into the research topics of 'CLMAE: A Liter and Faster Masked Autoencoders'. Together they form a unique fingerprint.

Cite this