R-MAE: Regions Meet Masked Autoencoders

Vision-specific concepts such as "region" have played a key role in extending general machine learning frameworks to tasks like object detection. Given the success of region-based detectors for supervised learning and the progress of intra-image methods for contrastive learning, we explore the use of regions for reconstructive pre-training. Starting from Masked Autoencoding (MAE) both as a baseline and an inspiration, we propose a parallel pre-text task tailored to address the one-to-many mapping between images and regions. Since such regions can be generated in an unsupervised way, our approach (R-MAE) inherits the wide applicability from MAE, while being more "region-aware". We conduct thorough analyses during the development of R-MAE, and converge on a variant that is both effective and efficient (1.3% overhead over MAE). Moreover, it shows consistent quantitative improvements when generalized to various pre-training data and downstream detection and segmentation benchmarks. Finally, we provide extensive qualitative visualizations to enhance the understanding of R-MAE's behaviour and potential. Code will be made available at https://github.com/facebookresearch/r-mae.

翻译：诸如“区域”这类视觉特定概念，在将通用机器学习框架扩展到目标检测等任务中发挥了关键作用。鉴于基于区域的检测器在监督学习中的成功，以及图像内方法在对比学习中的进展，我们探索了区域在重建预训练中的应用。以掩码自编码（MAE）作为基线和启发，我们提出了一种并行预文本任务，专门用于解决图像与区域之间的一对多映射问题。由于此类区域可以以无监督方式生成，我们的方法（R-MAE）继承了MAE的广泛适用性，同时更具“区域感知”能力。在R-MAE开发过程中，我们进行了深入分析，并最终确定了一种兼具有效性和高效性的变体（相比MAE仅增加1.3%开销）。此外，当泛化到多种预训练数据及下游检测与分割基准时，它表现出一致的量化性能提升。最后，我们提供了大量定性可视化结果，以增强对R-MAE行为与潜力的理解。代码将于https://github.com/facebookresearch/r-mae公开。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

官宣: 何恺明即将入职 MIT

专知会员服务

27+阅读 · 2023年7月31日