Semi-Supervised Visual Grounding (SSVG) is a new challenge for its sparse labeled data with the need for multimodel understanding. A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. However, this approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. These pipelines directly regress results without region proposals or foreground binary classification, rendering them unsuitable for fitting in RefTeacher due to the absence of confidence scores. Furthermore, the geometric difference in teacher and student inputs, stemming from different data augmentations, induces natural misalignment in attention-based constraints. To establish a compatible SSVG framework, our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS. Initially, the model is enhanced by incorporating an additional quantized detection head to expose its detection confidence. Building upon this, ACTRESS consists of an active sampling strategy and a selective retraining strategy. The active sampling strategy iteratively selects high-quality pseudo labels by evaluating three crucial aspects: Faithfulness, Robustness, and Confidence, optimizing the utilization of unlabeled data. The selective retraining strategy retrains the model with periodic re-initialization of specific parameters, facilitating the model's escape from local minima. Extensive experiments demonstrates our superior performance on widely-used benchmark datasets.
翻译:半监督视觉定位(SSVG)因其稀疏标注数据与多模态理解需求而成为新兴挑战。先前研究RefTeacher首次尝试通过采用师生框架提供伪置信度监督与基于注意力的监督来解决该任务。然而,该方法与当前基于Transformer架构的先进视觉定位模型不兼容。这些模型直接回归结果而无需区域建议或前景二分类,由于缺乏置信度分数而无法适配RefTeacher框架。此外,师生模型输入因数据增强策略不同产生的几何差异,会导致基于注意力的约束天然失配。为构建兼容的SSVG框架,本文提出面向半监督视觉定位的主动重训练方法(简称为ACTRESS)。首先,通过引入量化检测头显式暴露模型检测置信度以增强基础架构。在此基础上,ACTRESS包含主动采样策略与选择性重训练策略:主动采样策略通过评估忠实性、鲁棒性与置信度三个关键维度迭代筛选高质量伪标签,优化未标注数据利用率;选择性重训练策略通过周期性重初始化特定参数进行模型重训练,帮助模型逃离局部最优解。大量实验证明本方法在广泛使用的基准数据集上具有优越性能。