Due to domain shift, a large performance drop is usually observed when a trained crowd counting model is deployed in the wild. While existing domain-adaptive crowd counting methods achieve promising results, they typically regard each crowd image as a whole and reduce domain discrepancies in a holistic manner, thus limiting further improvement of domain adaptation performance. To this end, we propose to untangle \emph{domain-invariant} crowd and \emph{domain-specific} background from crowd images and design a fine-grained domain adaption method for crowd counting. Specifically, to disentangle crowd from background, we propose to learn crowd segmentation from point-level crowd counting annotations in a weakly-supervised manner. Based on the derived segmentation, we design a crowd-aware domain adaptation mechanism consisting of two crowd-aware adaptation modules, i.e., Crowd Region Transfer (CRT) and Crowd Density Alignment (CDA). The CRT module is designed to guide crowd features transfer across domains beyond background distractions. The CDA module dedicates to regularising target-domain crowd density generation by its own crowd density distribution. Our method outperforms previous approaches consistently in the widely-used adaptation scenarios.
翻译:由于域偏移,训练好的人群计数模型在实际应用时通常会出现较大的性能下降。现有的域自适应人群计数方法虽然取得了有希望的结果,但通常将每张人群图像视为一个整体,并以整体方式减少域差异,从而限制了域自适应性能的进一步改进。为此,我们提出从人群图像中分离出域不变的人群和域特定的背景,并设计了一种用于人群计数的细粒度域自适应方法。具体来说,为了从背景中分离人群,我们提出以弱监督方式从点级人群计数标注中学习人群分割。基于得到的分割,我们设计了一种由两个人群自适应模块组成的人群感知域自适应机制,即人群区域迁移(CRT)和人群密度对齐(CDA)。CRT模块旨在引导人群特征跨域迁移,避免背景干扰。CDA模块致力于通过目标域自身的人群密度分布来规范其人群密度生成。在广泛使用的自适应场景中,我们的方法始终优于以往的方法。