Fine-grained aerial object detection, driven by the intrinsic granularity of real-world object categories, is crucial for advanced scene understanding in remote sensing. Existing methods largely inherit the paradigm of coarse-grained object detection, relying solely on single-label supervision and thus struggling to distinguish model-level categories with subtle structural differences. However, for each specific model (e.g., Boeing 787), structured prior knowledge such as attributes and hierarchies offers discriminative semantics across multiple granularities. Motivated by this, we present ExpertDet, a scheme that incorporates expert-informed cues to enhance fine-grained aerial object detection. Specifically, we design Vision-aware Masked Attribute Modeling (VMAM), which aligns attribute semantics with visual structures by reconstructing randomly masked attributes from visual cues, enabling the detector to capture subtle structural distinctions. We further propose Hierarchical Visual Instance Promotion (HierVIP), which builds a visual prototype tree based on hierarchical relations and imposes taxonomy-aware constraints to preserve cross-level semantic continuity while enhancing category discrimination. Moreover, we curate a new fine-grained object detection benchmark for Precise recognition of model-specific Ships and Planes from aerial imagery, PSP, covering 106 ship classes and 30 airplane models, respectively, featuring the most extensive collection of model-specific categories among existing aerial object detection datasets to date. We benchmark state-of-the-art object detection algorithms on the PSP benchmark. Extensive evaluation demonstrates that ExpertDet consistently outperforms other fine-grained competitors across hierarchy levels. The dataset, benchmark, and code are available at https://nnnnerd.github.io/PSP-Benchmark/.
翻译:细粒度航空目标检测由真实世界目标类别固有的粒度驱动,对于遥感领域的深度场景理解至关重要。现有方法大多继承了粗粒度目标检测的范式,仅依赖单标签监督,因此难以区分具有细微结构差异的型号级类别。然而,针对每个具体型号(如波音787),属性和层级等结构化先验知识提供了跨多粒度的判别性语义。受此启发,我们提出ExpertDet,一种融合专家先验知识以增强细粒度航空目标检测的方案。具体而言,我们设计了视觉感知掩码属性建模(VMAM),通过从视觉线索中重建随机掩码属性,使属性语义与视觉结构对齐,从而使检测器能够捕获细微的结构差异。我们进一步提出层次视觉实例增强(HierVIP),该方法基于层级关系构建视觉原型树,并施加分类学感知约束,以在增强类别判别性的同时保持跨层级语义连续性。此外,我们构建了一个新的细粒度目标检测基准——PSP(精确识别航空影像中特定型号舰船与飞机),涵盖106个舰船类别和30个飞机型号,是现有航空目标检测数据集中型号级类别最为丰富的数据集。我们在PSP基准上评估了最先进的目标检测算法。大量实验表明,ExpertDet在层级结构各层次上均一致优于其他细粒度竞争方法。数据集、基准测试及代码已开源:https://nnnnerd.github.io/PSP-Benchmark/。