With the publication of DINO, a variant of the Detection Transformer (DETR), Detection Transformers are breaking the record in the object detection benchmark with the merits of their end-to-end design and scalability. However, the extension of DETR to oriented object detection has not been thoroughly studied although more benefits from its end-to-end architecture are expected such as removing NMS and anchor-related costs. In this paper, we propose a first strong DINO-based baseline for oriented object detection. We found that straightforward employment of DETRs for oriented object detection does not guarantee non-duplicate prediction, and propose a simple cost to mitigate this. Furthermore, we introduce a novel denoising strategy that uses Hungarian matching to filter redundant noised queries and query alignment to preserve matching consistency between Transformer decoder layers. Our proposed model outperforms previous rotated DETRs and other counterparts, achieving state-of-the-art performance in DOTA-v1.0/v1.5/v2.0, and DIOR-R benchmarks.
翻译:随着DINO(检测Transformer变体)的发布,检测Transformer凭借其端到端设计与可扩展性优势,不断刷新目标检测基准记录。然而,尽管DETR在方向目标检测中预期能带来更多端到端架构的优势(如消除NMS和锚点相关开销),其在该领域的拓展研究仍不充分。本文首次提出基于DINO的强基线模型用于方向目标检测。我们发现,直接将DETR应用于方向目标检测无法保证非重复预测,为此提出一种简易代价函数以缓解该问题。此外,我们引入新型去噪策略:通过匈牙利匹配过滤冗余噪声查询,并利用查询对齐保持Transformer解码器层级间的匹配一致性。我们的模型在DOTA-v1.0/v1.5/v2.0及DIOR-R基准数据集上全面超越先前旋转DETR及其他同类方法,达到最先进性能。