DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.
翻译:检测Transformer(DETR)及其变体(DETRs)已成功应用于密集行人检测,并取得了令人瞩目的性能。然而,我们发现,在不同拥挤程度的场景中,必须手动调整DETRs的查询数量,否则性能会不同程度地下降。本文首先分析了当前两种查询生成方法,并总结了设计自适应查询生成方法的四条准则。随后,我们提出了基于排名的自适应查询生成方法(RAQG)以缓解该问题。具体而言,我们设计了一个排名预测头,用于预测编码器生成的最低置信度正训练样本的排名。基于预测的排名,我们设计了一种自适应选择方法,能够自适应地选取编码器生成的粗检测结果来生成查询。此外,为更好地训练排名预测头,我们提出了软梯度L1损失(Soft Gradient L1 Loss)。该损失函数的梯度是连续的,能够精细地描述损失值与模型参数更新值之间的关系。我们的方法简单而有效,理论上可嵌入任何DETRs使其具备查询自适应能力。在Crowdhuman数据集和Citypersons数据集上的实验结果表明,我们的方法能够为DETRs自适应生成查询,并取得具有竞争力的结果。特别地,我们的方法在Crowdhuman数据集上实现了39.4%的MR,达到了当前最优水平。