Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD.
翻译:人群定位旨在预测图像中每个实例的精确位置。当前先进方法提出逐像素二值分类以应对拥挤场景下的预测,其中像素级阈值对行人头部预测置信度进行二值化。由于人群场景存在极端多样的内容、密度和尺度变化,置信度阈值学习器在面对领域知识偏移时表现脆弱且泛化能力不足。此外,在多数情况下,目标领域在训练过程中是未知的。因此,探索如何增强置信度阈值定位器对潜在目标领域的泛化能力至关重要。本文提出一种动态代理域方法,以提升学习器在领域偏移下的泛化能力。具体而言,基于对潜在目标领域中二值分类器泛化误差风险上界的理论分析,我们提出引入生成式代理域以促进泛化。随后,依据该理论设计了一种动态代理域算法,该算法由训练范式与代理域生成器组成,以增强置信度阈值学习器的领域泛化能力。此外,我们在五种领域偏移场景下进行实验,验证了该方法在人群定位泛化方面的有效性。我们的代码将开源至 https://github.com/zhangda1018/DPD。