Despite the promising results, existing oriented object detection methods usually involve heuristically designed rules, e.g., RRoI generation, rotated NMS. In this paper, we propose an end-to-end framework for oriented object detection, which simplifies the model pipeline and obtains superior performance. Our framework is based on DETR, with the box regression head replaced with a points prediction head. The learning of points is more flexible, and the distribution of points can reflect the angle and size of the target rotated box. We further propose to decouple the query features into classification and regression features, which significantly improves the model precision. Aerial images usually contain thousands of instances. To better balance model precision and efficiency, we propose a novel dynamic query design, which reduces the number of object queries in stacked decoder layers without sacrificing model performance. Finally, we rethink the label assignment strategy of existing DETR-like detectors and propose an effective label re-assignment strategy for improved performance. We name our method D2Q-DETR. Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods and achieves the new state-of-the-art.
翻译:尽管已有方法取得了令人鼓舞的结果,但现有定向目标检测方法通常涉及启发式设计的规则(如RRoI生成、旋转NMS)。本文提出了一种端到端的定向目标检测框架,该框架简化了模型流程并获得了优越的性能。我们的框架基于DETR,将边界框回归头替换为点预测头。点的学习更为灵活,且点的分布能够反映目标旋转框的角度和尺寸。我们进一步提出将查询特征解耦为分类特征和回归特征,这显著提升了模型精度。航拍图像通常包含数千个实例,为更好地平衡模型精度与效率,我们提出了一种新颖的动态查询设计,在不解耦模型性能的前提下减少了堆叠解码器层中的目标查询数量。最后,我们重新思考了现有类DETR检测器的标签分配策略,并提出了一种有效的标签重分配策略以提升性能。我们将所提出的方法命名为D2Q-DETR。在最大且具挑战性的DOTA-v1.0和DOTA-v1.5数据集上的实验表明,D2Q-DETR优于现有基于NMS和无NMS的定向目标检测方法,达到了新的最优性能。