Visual crowd counting estimates the density of the crowd using deep learning models such as convolution neural networks (CNNs). The performance of the model heavily relies on the quality of the training data that constitutes crowd images. In harsh weather such as fog, dust, and low light conditions, the inference performance may severely degrade on the noisy and blur images. In this paper, we propose the use of Pix2Pix generative adversarial network (GAN) to first denoise the crowd images prior to passing them to the counting model. A Pix2Pix network is trained using synthetic noisy images generated from original crowd images and then the pretrained generator is then used in the inference engine to estimate the crowd density in unseen, noisy crowd images. The performance is tested on JHU-Crowd dataset to validate the significance of the proposed method particularly when high reliability and accuracy are required.
翻译:视觉拥挤计数利用卷积神经网络(CNN)等深度学习模型估计人群密度,其性能高度依赖于构成人群图像的训练数据质量。在雾霾、沙尘及弱光等恶劣天气条件下,模型对含噪模糊图像的推理性能可能严重下降。本文提出先使用Pix2Pix生成对抗网络(GAN)对人群图像进行去噪处理,再将其输入计数模型。该Pix2Pix网络通过原始人群图像生成的合成噪声图像进行训练,随后将预训练的生成器集成至推理引擎,用于估计未知含噪人群图像的人群密度。在JHU-Crowd数据集上的性能测试验证了所提方法在需要高可靠性和高精度场景中的有效性。