We propose an extremely simple and highly effective approach to faithfully combine different object detectors to obtain a Mixture of Experts (MoE) that has a superior accuracy to the individual experts in the mixture. We find that naively combining these experts in a similar way to the well-known Deep Ensembles (DEs), does not result in an effective MoE. We identify the incompatibility between the confidence score distribution of different detectors to be the primary reason for such failure cases. Therefore, to construct the MoE, our proposal is to first calibrate each individual detector against a target calibration function. Then, filter and refine all the predictions from different detectors in the mixture. We term this approach as MoCaE and demonstrate its effectiveness through extensive experiments on object detection, instance segmentation and rotated object detection tasks. Specifically, MoCaE improves (i) three strong object detectors on COCO test-dev by $2.4$ $\mathrm{AP}$ by reaching $59.0$ $\mathrm{AP}$; (ii) instance segmentation methods on the challenging long-tailed LVIS dataset by $2.3$ $\mathrm{AP}$; and (iii) all existing rotated object detectors by reaching $82.62$ $\mathrm{AP_{50}}$ on DOTA dataset, establishing a new state-of-the-art (SOTA). Code will be made public.
翻译:我们提出一种极其简单且高效的方法,以忠实组合不同目标检测器,构建出准确率优于混合中单个专家的混合专家模型。研究发现,若像常见的深度集成那样简单组合这些专家,无法形成有效的混合专家模型。我们指出不同检测器置信分数分布的不兼容性是导致此类失败的主要原因。因此,为构建混合专家模型,我们首先针对目标校准函数对每个检测器进行单独校准,然后对混合中所有检测器的预测结果进行过滤与精炼。我们将该方法称为MoCaE,并通过在目标检测、实例分割及旋转目标检测任务上的大量实验证明其有效性。具体而言,MoCaE将COCO test-dev上三个强目标检测器的AP提升2.4,达到59.0 AP;将长尾LVIS数据集上实例分割方法的AP提升2.3;并将DOTA数据集上所有现有旋转目标检测器的AP₅₀提升至82.62,创下新的最优结果。代码将开源。