This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground truth; Secondly, we enhance the transformer decoder by using density tokens to specialize the forwards of decoders w.r.t. different density intervals; Thirdly, we design the interleaving consistency self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings. Code will be released at https://github.com/LoraLinH/Semi-supervised-Counting-via-Pixel-by-pixel-Density-Distribution-Modelling.
翻译:本文聚焦于半监督人群计数问题,其中仅有一小部分训练数据带有标注。我们将待回归的像素级密度值建模为概率分布,而非单一确定值。基于此,我们提出了一种半监督人群计数模型。首先,我们设计了逐像素分布匹配损失函数,用于衡量预测与真实值在像素级密度分布上的差异;其次,我们利用密度令牌增强Transformer解码器,使其不同前向传播过程针对不同密度区间进行特化;第三,我们设计了交错一致性自监督学习机制,以高效地从无标注数据中学习。在四个数据集上的大量实验表明,在不同标注比例设置下,我们的方法以显著优势优于现有竞争对手。代码将在https://github.com/LoraLinH/Semi-supervised-Counting-via-Pixel-by-pixel-Density-Distribution-Modelling 开源。