Detecting pedestrians accurately in urban scenes is significant for realistic applications like autonomous driving or video surveillance. However, confusing human-like objects often lead to wrong detections, and small scale or heavily occluded pedestrians are easily missed due to their unusual appearances. To address these challenges, only object regions are inadequate, thus how to fully utilize more explicit and semantic contexts becomes a key problem. Meanwhile, previous context-aware pedestrian detectors either only learn latent contexts with visual clues, or need laborious annotations to obtain explicit and semantic contexts. Therefore, we propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations. Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models. Furthermore, a self-supervised Prototypical Semantic Contrastive (PSC) learning method is proposed to better discriminate pedestrians and other classes, based on more explicit and semantic contexts obtained from VLS. Extensive experiments on popular benchmarks show that our proposed VLPD achieves superior performances over the previous state-of-the-arts, particularly under challenging circumstances like small scale and heavy occlusion. Code is available at https://github.com/lmy98129/VLPD.
翻译:在城市场景中准确检测行人对于自动驾驶或视频监控等实际应用具有重要意义。然而,类人物体常导致误检,而小尺度或严重遮挡的行人因其异常外观而易被漏检。针对这些挑战,仅利用目标区域并不足够,因此如何充分挖掘更显式且语义性的上下文成为关键问题。同时,现有上下文感知行人检测器要么仅依赖视觉线索学习潜在上下文,要么需要耗费大量标注以获取显式语义上下文。为此,本文通过视觉语言语义自监督提出一种新颖的上下文感知行人检测方法(VLPD),无需额外标注即可建模显式语义上下文。首先,我们提出一种自监督视觉语言语义(VLS)分割方法,该方法通过视觉语言模型自生成的语义类别显式标签,同时学习全监督行人检测与上下文分割。其次,提出一种自监督原型语义对比(PSC)学习方法,基于VLS获得的更显式且语义性的上下文,更好地区分行人与其他类别。在主流基准上的大量实验表明,所提VLPD在先前最优方法上取得了更优性能,尤其在处理小尺度与严重遮挡等挑战场景时表现突出。代码已开源至https://github.com/lmy98129/VLPD。