The KFIoU Loss for Rotated Object Detection

from arxiv, 17 pages, 6 figures, 7 tables, accepted by ICLR 2023, TensorFlow code: https://github.com/yangxue0827/RotationDetection, PyTorch code: https://github.com/open-mmlab/mmrotate, Jittor code: https://github.com/Jittor/JDet

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics. In contrast, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeing and Kalman filter, which mainly consists of two items. The first term is a scale-insensitive center point loss, which is used to quickly get the center points between bounding boxes closer to assist the second term. In the distance-independent second term, Kalman filter is adopted to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU loss at trend-level within a certain distance (i.e. within 9 pixels). This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors. The resulting new loss called KFIoU loss is easier to implement and works better compared with exact SkewIoU loss, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D detection. Extensive results on various public datasets (2-D/3-D, aerial/text/face images) with different base detectors show the effectiveness of our approach.

翻译：不同于水平目标检测领域易于采用且与检测指标契合的计算友好的IoU损失函数，旋转检测器通常需要基于SkewIoU的复杂损失函数，该函数对基于梯度的训练不友好。本文提出一种基于高斯建模和卡尔曼滤波的有效近似SkewIoU损失函数，该损失主要由两项组成。第一项是尺度不敏感的中心点损失，用于快速拉近边界框中心点，以辅助第二项的计算。在距离无关的第二项中，我们采用卡尔曼滤波通过其定义机制内在模拟SkewIoU，并表明其在特定距离（9像素内）的趋势层面与SkewIoU损失保持一致性。这与近期基于高斯建模的旋转检测器（如GWD损失和KLD损失）不同，后者需要人为指定的分布距离度量，且超参数需随数据集和检测器调整。所提出的KFIoU损失因具备完全可微性和处理非重叠情况的能力，相比精确SkewIoU损失更易实现且效果更优。我们进一步将该技术扩展到同样面临二维检测问题的三维情况。在多种公开数据集（2D/3D、航空、文本、人脸图像）上采用不同基础检测器的实验结果证明了该方法的有效性。