Recent end-to-end multi-object detectors simplify the inference pipeline by removing hand-crafted processes such as non-maximum suppression (NMS). However, during training, they still heavily rely on heuristics and hand-crafted processes which deteriorate the reliability of the predicted confidence score. In this paper, we propose a novel framework to train an end-to-end multi-object detector consisting of only two terms: negative log-likelihood (NLL) and a regularization term. In doing so, the multi-object detection problem is treated as density estimation of the ground truth bounding boxes utilizing a regularized mixture density model. The proposed \textit{end-to-end multi-object Detection with a Regularized Mixture Model} (D-RMM) is trained by minimizing the NLL with the proposed regularization term, maximum component maximization (MCM) loss, preventing duplicate predictions. Our method reduces the heuristics of the training process and improves the reliability of the predicted confidence score. Moreover, our D-RMM outperforms the previous end-to-end detectors on MS COCO dataset.
翻译:近年来,端到端多目标检测器通过移除非极大值抑制(NMS)等人为设计流程简化了推理管线。然而在训练过程中,此类检测器仍严重依赖启发式规则和人工流程,这降低了预测置信分数的可靠性。本文提出一种新型框架,仅通过负对数似然(NLL)与正则化项两项损失函数训练端到端多目标检测器。通过将多目标检测问题视为基于正则化混合密度模型对真实边界框的密度估计,我们提出的"正则化混合模型端到端多目标检测"(D-RMM)方法通过最小化NLL与所提出的最大分量最大化(MCM)正则化损失来训练,从而有效抑制重复预测。本方法减少了训练过程中的启发式依赖,提升了预测置信分数的可靠性。在MS COCO数据集上的实验表明,D-RMM的性能优于现有端到端检测器。