Combining the strengths of many existing predictors to obtain a Mixture of Experts which is superior to its individual components is an effective way to improve the performance without having to develop new architectures or train a model from scratch. However, surprisingly, we find that na\"ively combining expert object detectors in a similar way to Deep Ensembles, can often lead to degraded performance. We identify that the primary cause of this issue is that the predictions of the experts do not match their performance, a term referred to as miscalibration. Consequently, the most confident detector dominates the final predictions, preventing the mixture from leveraging all the predictions from the experts appropriately. To address this, when constructing the Mixture of Experts, we propose to combine their predictions in a manner which reflects the individual performance of the experts; an objective we achieve by first calibrating the predictions before filtering and refining them. We term this approach the Mixture of Calibrated Experts and demonstrate its effectiveness through extensive experiments on 5 different detection tasks using a variety of detectors, showing that it: (i) improves object detectors on COCO and instance segmentation methods on LVIS by up to $\sim 2.5$ AP; (ii) reaches state-of-the-art on COCO test-dev with $65.1$ AP and on DOTA with $82.62$ $\mathrm{AP_{50}}$; (iii) outperforms single models consistently on recent detection tasks such as Open Vocabulary Object Detection.
翻译:将多个现有预测器的优势结合以构建一个优于其各个组件的专家混合模型,是一种无需开发新架构或从头训练模型即可提升性能的有效方式。然而,令人意外的是,我们发现以类似深度集成的方式简单组合专家目标检测器,往往会导致性能下降。我们确定这一问题的主要原因在于专家预测与其实际性能不匹配,这一现象被称为校准失准。因此,置信度最高的检测器主导最终预测,使得混合模型无法充分利用所有专家的预测。为解决此问题,在构建专家混合模型时,我们提出以反映各专家个体性能的方式组合其预测;通过先校准预测再执行过滤和精炼来实现这一目标。我们将此方法称为混合校准专家,并通过在5个不同检测任务上使用多种检测器进行的大量实验证明其有效性: (i) 在COCO数据集上提升目标检测性能,在LVIS数据集上提升实例分割性能,最高达约2.5 AP;(ii) 在COCO测试集上达到65.1 AP的最新最优性能,在DOTA数据集上达到82.62 $\mathrm{AP_{50}}$;(iii) 在开放词汇目标检测等最新检测任务上持续优于单一模型。