The existing crowd counting models require extensive training data, which is time-consuming to annotate. To tackle this issue, we propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM), an adaptation of the Segmentation Anything Model (SAM), to generate pseudo-labels for training crowd counting models. However, our initial investigation reveals that SEEM's performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas. To overcome this limitation, we propose an adaptive resolution SEEM to handle the scale variations, occlusions, and overlapping of people within crowd scenes. Alongside this, we introduce a robust localization method, based on Gaussian Mixture Models, for predicting the head positions in the predicted people masks. Given the mask and point pseudo-labels, we propose a robust loss function, which is designed to exclude uncertain regions based on SEEM's predictions, thereby enhancing the training process of the counting networks. Finally, we propose an iterative method for generating pseudo-labels. This method aims at improving the quality of the segmentation masks by identifying more tiny persons in high-density regions, which are often missed in the first pseudo-labeling stage. Overall, our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable results to some supervised methods. This makes it a highly effective and versatile tool for crowd counting, especially in situations where labeled data is not available.
翻译:现有的人群计数模型需要大量训练数据,而数据标注耗时费力。为解决此问题,我们提出一种简单而有效的人群计数方法,利用"分段万物模型"(SEEM,即分割万物模型SAM的变体)生成伪标签来训练人群计数模型。然而,初步研究表明,SEEM在密集人群场景中的性能有限,主要原因是高密度区域中许多人被遗漏。为克服这一限制,我们提出自适应分辨率SEEM来处理人群场景中的人群尺度变化、遮挡和重叠问题。同时,我们引入基于高斯混合模型的鲁棒定位方法,用于预测所得人群掩码中的头部位置。基于掩码和点状伪标签,我们设计了一种鲁棒损失函数,该函数根据SEEM的预测排除不确定区域,从而增强计数网络的训练过程。最后,我们提出一种迭代式伪标签生成方法,旨在通过识别高密度区域中常被首轮伪标签阶段遗漏的更小个体来提升分割掩码的质量。总体而言,所提出的方法在无监督人群计数中取得了最佳性能,同时与部分监督方法结果相当。这使得它成为人群计数的高效通用工具,尤其在缺乏标注数据的场景中优势显著。