Energy-Inspired Self-Supervised Pretraining for Vision Models

Motivated by the fact that forward and backward passes of a deep network naturally form symmetric mappings between input and output representations, we introduce a simple yet effective self-supervised vision model pretraining framework inspired by energy-based models (EBMs). In the proposed framework, we model energy estimation and data restoration as the forward and backward passes of a single network without any auxiliary components, e.g., an extra decoder. For the forward pass, we fit a network to an energy function that assigns low energy scores to samples that belong to an unlabeled dataset, and high energy otherwise. For the backward pass, we restore data from corrupted versions iteratively using gradient-based optimization along the direction of energy minimization. In this way, we naturally fold the encoder-decoder architecture widely used in masked image modeling into the forward and backward passes of a single vision model. Thus, our framework now accepts a wide range of pretext tasks with different data corruption methods, and permits models to be pretrained from masked image modeling, patch sorting, and image restoration, including super-resolution, denoising, and colorization. We support our findings with extensive experiments, and show the proposed method delivers comparable and even better performance with remarkably fewer epochs of training compared to the state-of-the-art self-supervised vision model pretraining methods. Our findings shed light on further exploring self-supervised vision model pretraining and pretext tasks beyond masked image modeling.

翻译：受深度网络前向与反向传播天然形成输入输出表示对称映射的启发，我们提出一种简洁高效的自监督视觉模型预训练框架，该框架受能量模型（EBMs）启发。在该框架中，我们将能量估计与数据修复建模为单一网络的前向与反向传播，无需任何辅助组件（如额外解码器）。在前向传播中，我们将网络拟合为能量函数，为无标签数据集样本分配低能量值，其他样本则分配高能量值。在反向传播中，我们沿能量最小化方向通过梯度优化迭代修复受损数据。由此，我们巧妙地将掩码图像建模中广泛使用的编码器-解码器架构折叠到单一视觉模型的前向与反向传播中。因此，本框架可兼容多种不同数据损坏方式的预文本任务，支持从掩码图像建模、补丁排序、图像修复（包括超分辨率、去噪和着色）中进行模型预训练。通过大量实验验证，我们发现该方法能够以显著更少的训练轮次达到与现有最优自监督视觉模型预训练方法相当甚至更优的性能。本发现为探索超越掩码图像建模的自监督视觉模型预训练及预文本任务提供了新思路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日