With the rapid development of deep learning, object detection and tracking play a vital role in today's society. Being able to identify and track all the pedestrians in the dense crowd scene with computer vision approaches is a typical challenge in this field, also known as the Multiple Object Tracking (MOT) challenge. Modern trackers are required to operate on more and more complicated scenes. According to the MOT20 challenge result, the pedestrian is 4 times denser than the MOT17 challenge. Hence, improving the ability to detect and track in extremely crowded scenes is the aim of this work. In light of the occlusion issue with the human body, the heads are usually easier to identify. In this work, we have designed a joint head and body detector in an anchor-free style to boost the detection recall and precision performance of pedestrians in both small and medium sizes. Innovatively, our model does not require information on the statistical head-body ratio for common pedestrians detection for training. Instead, the proposed model learns the ratio dynamically. To verify the effectiveness of the proposed model, we evaluate the model with extensive experiments on different datasets, including MOT20, Crowdhuman, and HT21 datasets. As a result, our proposed method significantly improves both the recall and precision rate on small & medium sized pedestrians and achieves state-of-the-art results in these challenging datasets.
翻译:随着深度学习的快速发展,目标检测与跟踪在当今社会中扮演着至关重要的角色。利用计算机视觉方法在密集人群场景中识别并跟踪所有行人,是该领域的典型挑战,也称为多目标跟踪(MOT)挑战。现代跟踪器需要在越来越复杂的场景中运行。根据MOT20挑战赛的结果,行人密度是MOT17挑战赛的4倍。因此,提升在极度拥挤场景中的检测与跟踪能力是本研究的目标。鉴于人体因遮挡问题难以识别,而头部通常更容易辨认,本研究设计了一种无锚点式的联合头部与身体检测器,以提升中小尺寸行人的检测召回率与精确率。创新之处在于,我们的模型在训练时无需统计常见的行人头身比信息,而是通过动态学习的方式获取头身比例。为验证所提模型的有效性,我们在MOT20、Crowdhuman和HT21等多个数据集上进行了大量实验。结果表明,本方法显著提升了中小尺寸行人的召回率与精确率,并在这些具有挑战性的数据集上达到了最先进的性能。