We propose an extremely simple and highly effective approach to faithfully combine different object detectors to obtain a Mixture of Experts (MoE) that has a superior accuracy to the individual experts in the mixture. We find that naively combining these experts in a similar way to the well-known Deep Ensembles (DEs), does not result in an effective MoE. We identify the incompatibility between the confidence score distribution of different detectors to be the primary reason for such failure cases. Therefore, to construct the MoE, our proposal is to first calibrate each individual detector against a target calibration function. Then, filter and refine all the predictions from different detectors in the mixture. We term this approach as MoCaE and demonstrate its effectiveness through extensive experiments on object detection, instance segmentation and rotated object detection tasks. Specifically, MoCaE improves (i) three strong object detectors on COCO test-dev by $2.4$ $\mathrm{AP}$ by reaching $59.0$ $\mathrm{AP}$; (ii) instance segmentation methods on the challenging long-tailed LVIS dataset by $2.3$ $\mathrm{AP}$; and (iii) all existing rotated object detectors by reaching $82.62$ $\mathrm{AP_{50}}$ on DOTA dataset, establishing a new state-of-the-art (SOTA). Code will be made public.
翻译:我们提出了一种极其简单且高效的方法,用于可靠地组合不同的目标检测器,从而获得一个混合专家(MoE)模型,其准确率显著优于混合中的任何单个专家。我们发现,简单地将这些专家按照广泛使用的深度集成(DEs)方式进行组合,无法形成有效的MoE。我们将不同检测器置信度分数分布的不兼容性识别为此类失败案例的主要原因。因此,为了构建MoE,我们首先针对目标校准函数对每个检测器进行单独校准,然后对混合中来自不同检测器的所有预测结果进行过滤与细化。我们将此方法命名为MoCaE,并通过在目标检测、实例分割和旋转目标检测任务上的大量实验验证了其有效性。具体而言,MoCaE实现了以下提升:(i)在COCO test-dev数据集上,将三种强目标检测器的AP提升2.4,达到59.0 AP;(ii)在具有挑战性的长尾LVIS数据集上,将实例分割方法的AP提升2.3;以及(iii)在DOTA数据集上,将所有现有旋转目标检测器的AP₅₀提升至82.62,创下新的最先进水平(SOTA)。代码将公开。