Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved in pre-training. Hence, reconstruction from regular corrupted images cannot ensure the difficulty of the pretext task, potentially leading to a performance decline. Moreover, generating corrupted images might introduce an extra generator, resulting in a notable computational burden. To address these issues, we propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets. Adversarial examples, generated online using only the trained models, can directly aim to disrupt tasks associated with pre-training. Therefore, the incorporation not only elevates the level of challenge in reconstruction but also enhances efficiency, contributing to the acquisition of superior representations by the model. In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images. We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training. It is noted that our method is not restricted to specific model architectures and MIM strategies, rendering it an adaptable plug-in capable of enhancing all MIM methods. Experimental findings substantiate the remarkable capability of our approach in amplifying the generalization and robustness of existing MIM methods. Notably, our method surpasses the performance of baselines on various tasks, including ImageNet, its variants, and other downstream tasks.
翻译:掩码图像建模(MIM)因其在表征学习方面的卓越能力而受到广泛关注。作为传统方法的替代方案,从受损图像中重建最近已成为一种有前景的预训练任务。然而,常规受损图像通常使用通用生成器生成,往往与预训练中涉及的具体重建任务缺乏相关性。因此,从常规受损图像进行重建无法确保预训练任务的难度,可能导致性能下降。此外,生成受损图像可能会引入额外的生成器,导致显著的计算负担。为解决这些问题,我们提出将对抗样本融入掩码图像建模,作为新的重建目标。对抗样本仅使用已训练模型在线生成,可直接针对干扰与预训练相关的任务。因此,这种融入不仅提升了重建任务的挑战性,还提高了效率,有助于模型获得更优的表征。具体而言,我们引入了一种新颖的辅助预训练任务,即重建与原始图像对应的对抗样本。我们还设计了一种创新的对抗攻击方法,以生成更适合MIM预训练的对抗样本。值得注意的是,我们的方法不受特定模型架构和MIM策略的限制,使其成为一种能够增强所有MIM方法的适应性插件。实验结果证实了我们的方法在提升现有MIM方法的泛化能力和鲁棒性方面的显著能力。特别是,我们的方法在ImageNet、其变体以及其他下游任务等多种任务上均超越了基线性能。