Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision. HUWSOD innovatively incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate resampling pyramid to replace traditional object proposals, enabling end-to-end WSOD training and inference. Additionally, we implement a holistic self-training scheme that refines detection scores and coordinates through step-wise entropy minimization and consistency-constraint regularization, ensuring consistent predictions across stochastic augmentations of the same image. Extensive experiments on PASCAL VOC and MS COCO demonstrate that HUWSOD competes with state-of-the-art WSOD methods, eliminating the need for offline proposals and additional data. The peak performance of HUWSOD approaches that of fully-supervised Faster R-CNN. Our findings also indicate that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.
翻译:大多数弱监督目标检测(WSOD)方法依赖于传统的目标候选框生成候选区域,并面临训练不稳定的问题,容易陷入较差的局部最优解。本文提出一种统一的高容量弱监督目标检测网络HUWSOD,该网络采用全面的自训练框架,无需外部模块或额外监督。HUWSOD创新性地引入了自监督候选框生成器和具有多速率重采样金字塔的自编码器候选框生成器,以替代传统的目标候选框,实现了端到端的WSOD训练与推理。此外,我们设计了一种整体自训练方案,通过逐步熵最小化和一致性约束正则化来优化检测分数与坐标,确保同一图像经随机增强后预测结果的一致性。在PASCAL VOC和MS COCO数据集上的大量实验表明,HUWSOD与当前最先进的WSOD方法性能相当,且无需离线候选框生成和额外数据。HUWSOD的峰值性能接近全监督Faster R-CNN。我们的研究还表明,随机初始化的检测框虽然与精心设计的离线目标候选框存在显著差异,但对WSOD训练依然有效。