Training a convolutional neural network (CNN) to detect infrared small targets in a fully supervised manner has gained remarkable research interests in recent years, but is highly labor expensive since a large number of per-pixel annotations are required. To handle this problem, in this paper, we make the first attempt to achieve infrared small target detection with point-level supervision. Interestingly, during the training phase supervised by point labels, we discover that CNNs first learn to segment a cluster of pixels near the targets, and then gradually converge to predict groundtruth point labels. Motivated by this "mapping degeneration" phenomenon, we propose a label evolution framework named label evolution with single point supervision (LESPS) to progressively expand the point label by leveraging the intermediate predictions of CNNs. In this way, the network predictions can finally approximate the updated pseudo labels, and a pixel-level target mask can be obtained to train CNNs in an end-to-end manner. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Experimental results show that CNNs equipped with LESPS can well recover the target masks from corresponding point labels, {and can achieve over 70% and 95% of their fully supervised performance in terms of pixel-level intersection over union (IoU) and object-level probability of detection (Pd), respectively. Code is available at https://github.com/XinyiYing/LESPS.
翻译:近年来,以全监督方式训练卷积神经网络(CNN)进行红外小目标检测已引起显著的研究兴趣,但由于需要大量逐像素标注,其人力成本极高。为解决此问题,本文首次尝试在点级监督下实现红外小目标检测。有趣的是,在点标签监督的训练阶段,我们发现CNN首先学习分割目标附近的一簇像素,随后逐渐收敛以预测真实点标签。受此“映射退化”现象启发,我们提出了一种名为单点监督标签演化(LESPS)的标签演化框架,通过利用CNN的中间预测逐步扩展点标签。如此,网络预测最终可逼近更新后的伪标签,并获得像素级目标掩码以端到端地训练CNN。我们进行了大量实验并辅以具洞察力的可视化结果以验证方法的有效性。实验表明,配备LESPS的CNN能够从对应的点标签中良好地恢复目标掩码,并在像素级交并比(IoU)和对象级检测概率(Pd)指标上分别达到其全监督性能的70%和95%以上。代码发布于 https://github.com/XinyiYing/LESPS。