Masked Autoencoders as Image Processors

Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

翻译：Transformer在包括高层视觉和低层视觉的多种视觉任务中展现出显著效果。近期，用于特征预训练的掩码自编码器（MAE）进一步释放了Transformer的潜力，在多种高层视觉任务上达到了最先进性能。然而，MAE预训练对低层视觉任务的重要意义尚未得到充分探索。本文证明，掩码自编码器同样是可扩展的图像处理任务自监督学习器。我们首先提出一种兼顾通道注意力与移动窗口自注意力的高效Transformer模型，称为CSformer；随后开发了一种适用于图像处理任务（MAEIP）的有效MAE架构。大量实验结果表明，借助MAEIP预训练，我们提出的CSformer在多种图像处理任务（包括高斯去噪、真实图像去噪、单图像运动去模糊、散焦去模糊及图像去雨）上均达到了最先进性能。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

加速图神经网络推理，121页ppt，普林斯顿大学JAVIER DUARTE主讲

专知会员服务

33+阅读 · 2022年6月13日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日