Recent query-based detectors have achieved remarkable progress, yet their performance remains constrained when handling objects with arbitrary orientations, especially for tiny objects capturing limited texture information. This limitation primarily stems from the underutilization of intrinsic geometry during pixel-based feature decoding and the occurrence of inter-stage matching inconsistency caused by stage-wise bipartite matching. To tackle these challenges, we present IGOFormer, a novel query-based oriented object detector that explicitly integrates intrinsic geometry into feature decoding and enhances inter-stage matching stability. Specifically, we design an Intrinsic Geometry-aware Decoder, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation. Meanwhile, a Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs by formulating an exponential moving average with query-specific smoothing factors, effectively preventing conflicting supervisory signals arising from inter-stage matching inconsistency. Extensive experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection, achieving an AP$_{50}$ score of 78.00\% on DOTA-V1.0 using Swin-T backbone under the single-scale setting. The code will be made publicly available.
翻译:近年来,基于查询的检测器取得了显著进展,但在处理任意方向的目标时,其性能仍受到限制,尤其是对于纹理信息有限的微小目标。这一局限主要源于基于像素的特征解码过程中对内在几何特性的利用不足,以及由分阶段二分图匹配引起的阶段间匹配不一致问题。为应对这些挑战,本文提出IGOFormer——一种新型的基于查询的定向目标检测器,其将内在几何特性显式地整合到特征解码中,并增强了阶段间匹配的稳定性。具体而言,我们设计了一种内在几何感知解码器,该解码器通过注入从目标查询与其特征相关性中推导出的互补几何嵌入,来增强与目标查询相关的特征表示,从而捕获目标的几何布局,为其方向提供关键的几何洞察。同时,我们开发了一种基于动量的二分图匹配方案,该方案通过构建具有查询特定平滑因子的指数移动平均,自适应地聚合历史匹配代价,有效避免了因阶段间匹配不一致而产生的冲突监督信号。大量的实验与消融研究证明了我们的IGOFormer在航空定向目标检测任务中的优越性,在单尺度设置下使用Swin-T骨干网络,在DOTA-V1.0数据集上实现了78.00%的AP$_{50}$分数。代码将公开提供。