Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection

In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.

翻译：在目标检测中，诸如非极大值抑制（NMS）等后处理方法被广泛使用。NMS 能大幅减少假阳性检测的数量，但仍可能保留一些目标性分数较低的检测结果。为了精确确定图像中物体的数量及其标签，我们提出了一种名为检测选择算法（DSA）的后处理方法，该方法在 NMS 或相关方法之后使用。DSA 贪婪地选择一组检测到的边界框，并附带完整的物体重建结果，从而以最高似然性给出整幅图像的解释，同时考虑物体间的遮挡关系。该算法包含四个组成部分。首先，我们在 Faster R-CNN 中添加一个遮挡分支，以获取物体间的遮挡关系。其次，我们开发了一种单一重建算法，该算法能够根据物体的可见部分重建其完整外观，基于对训练好的生成网络（称为解码器）的潜变量优化。第三，我们提出了一种整体重建算法，该算法在考虑遮挡顺序的情况下，生成假设解释中所有物体的联合重建结果。最后，我们提出了一种贪婪算法，通过从列表中逐步添加或移除检测结果，以最大化相应解释的似然性。在我们的实验（涉及包含多个三维物体的合成图像）中，结合 NMS 或 Soft-NMS 的 DSA 相比单独使用 NMS 或 Soft-NMS 能获得更优的结果。