Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we aim to train a model based on a single source domain which can generalize well on any unseen domain. This falls into the realm of domain generalization that remains unexplored in crowd counting. We first introduce a dynamic sub-domain division scheme which divides the source domain into multiple sub-domains such that we can initiate a meta-learning framework for domain generalization. The sub-domain division is dynamically refined during the meta-learning. Next, in order to disentangle domain-invariant information from domain-specific information in image features, we design the domain-invariant and -specific crowd memory modules to re-encode image features. Two types of losses, i.e. feature reconstruction and orthogonal losses, are devised to enable this disentanglement. Extensive experiments on several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show the strong generalizability of our method.
翻译:人群数据间的域偏移严重阻碍了人群计数模型对未见场景的泛化能力。尽管域自适应人群计数方法在一定程度上缩小了这一差距,但它们仍依赖目标域数据来调整模型(如微调)以适应特定域。本文旨在基于单一源域训练一个能够在任何未见域上良好泛化的模型。这属于人群计数领域尚未探索的域泛化范畴。我们首先引入一种动态子域划分方案,将源域划分为多个子域,从而可以启动面向域泛化的元学习框架。子域划分在元学习过程中动态优化。其次,为从图像特征中解耦域不变信息与域特定信息,我们设计了域不变与域特定人群记忆模块,用于重新编码图像特征。通过特征重构损失和正交损失两类损失函数实现该解耦过程。在多个标准人群计数基准(如SHA、SHB、QNRF和NWPU)上的大量实验表明,我们的方法具有强大的泛化能力。