During training, supervised object detection tries to correctly match the predicted bounding boxes and associated classification scores to the ground truth. This is essential to determine which predictions are to be pushed towards which solutions, or to be discarded. Popular matching strategies include matching to the closest ground truth box (mostly used in combination with anchors), or matching via the Hungarian algorithm (mostly used in anchor-free methods). Each of these strategies comes with its own properties, underlying losses, and heuristics. We show how Unbalanced Optimal Transport unifies these different approaches and opens a whole continuum of methods in between. This allows for a finer selection of the desired properties. Experimentally, we show that training an object detection model with Unbalanced Optimal Transport is able to reach the state-of-the-art both in terms of Average Precision and Average Recall as well as to provide a faster initial convergence. The approach is well suited for GPU implementation, which proves to be an advantage for large-scale models.
翻译:在训练过程中,监督式目标检测需将预测的边界框及其分类得分与真实标注进行正确匹配。这一过程至关重要,用以决定哪些预测应向哪些解优化,或哪些应被舍弃。常见的匹配策略包括:匹配至最近的真实框(多与锚点结合使用),或通过匈牙利算法进行匹配(多用于无锚方法)。每种策略均具备自身特性、基础损失函数及启发式规则。我们证明,非平衡最优运输能够统一这些不同方法,并开辟出介于两者之间的完整连续体,从而实现所需特性的精细选择。实验表明,采用非平衡最优运输训练的目标检测模型,在平均精度和平均召回率方面均可达到当前最优水平,同时还能实现更快的初始收敛速度。该方法适用于GPU实现,这成为大规模模型训练的一大优势。