Pre-training plays a vital role in various vision tasks, such as object recognition and detection. Commonly used pre-training methods, which typically rely on randomized approaches like uniform or Gaussian distributions to initialize model parameters, often fall short when confronted with long-tailed distributions, especially in detection tasks. This is largely due to extreme data imbalance and the issue of simplicity bias. In this paper, we introduce a novel pre-training framework for object detection, called Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a Holistic-Local Contrastive Learning mechanism, which aligns pre-training with object detection by capturing both global contextual semantics and detailed local patterns. To tackle the imbalance inherent in long-tailed data, we design a dynamic rebalancing strategy that adjusts the sampling of underrepresented instances throughout the pre-training process, ensuring better representation of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by enforcing a reconstruction task aligned with the self-consistency principle, specifically benefiting underrepresented tail classes. Experiments on COCO and LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in improving the mAP/AP scores for tail classes.
翻译:预训练在目标识别与检测等多种视觉任务中发挥着至关重要的作用。常用的预训练方法通常依赖于均匀分布或高斯分布等随机化策略来初始化模型参数,但在面对长尾分布时,尤其是在检测任务中,这些方法往往表现不佳。这主要归因于极端的数据不平衡以及简单性偏差问题。本文提出了一种新颖的目标检测预训练框架,称为基于双重建的动态重平衡对比学习(2DRCL)。我们的方法建立在整体-局部对比学习机制之上,通过同时捕获全局上下文语义和精细的局部模式,使预训练与目标检测任务对齐。为了应对长尾数据固有的不平衡性,我们设计了一种动态重平衡策略,该策略在整个预训练过程中动态调整对代表性不足实例的采样,从而确保尾部类别获得更好的表征。此外,双重建任务通过强制执行与自洽原则一致的重建任务来应对简单性偏差,特别有利于代表性不足的尾部类别。在COCO和LVIS v1.0数据集上的实验证明了我们方法的有效性,尤其是在提升尾部类别的mAP/AP分数方面。