Fine-grained object detection (FGOD) extends object detection with the capability of fine-grained recognition. In recent two-stage FGOD methods, the region proposal serves as a crucial link between detection and fine-grained recognition. However, current methods overlook that some proposal-related procedures inherited from general detection are not equally suitable for FGOD, limiting the multi-task learning from generation, representation, to utilization. In this paper, we present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to better handle the sub-tasks in two-stage FGOD methods. Firstly, an anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition to generate high-quality oriented proposals. Additionally, we present a Bilinear Channel Fusion Network (BCFN) to extract independent and discriminative features of the proposals. Furthermore, we design a novel Adaptive Recognition Loss (ARL) which offers guidance for the R-CNN head to focus on high-quality proposals. Extensive experiments validate the effectiveness of PETDet. Quantitative analysis reveals that PETDet with ResNet50 reaches state-of-the-art performance on various FGOD datasets, including FAIR1M-v1.0 (42.96 AP), FAIR1M-v2.0 (48.81 AP), MAR20 (85.91 AP) and ShipRSImageNet (74.90 AP). The proposed method also achieves superior compatibility between accuracy and inference speed. Our code and models will be released at https://github.com/canoe-Z/PETDet.
翻译:细粒度目标检测(FGOD)在目标检测基础上扩展了细粒度识别能力。在现有两阶段FGOD方法中,区域候选框作为检测与细粒度识别之间的关键桥梁。然而,当前方法忽略了从通用检测继承的部分候选框相关流程并不完全适用于FGOD,从而限制了从生成、表征到利用的多任务学习能力。本文提出PETDet(Proposal Enhancement for Two-stage fine-grained object detection),旨在更好地处理两阶段FGOD方法中的子任务。首先,提出无锚框的定向质量候选框网络(QOPN),通过动态标签分配与基于注意力的分解生成高质量定向候选框。此外,我们提出双线性通道融合网络(BCFN)以提取候选框的独立判别性特征。进一步,设计新型自适应识别损失(ARL),为R-CNN头部提供引导以聚焦高质量候选框。大量实验验证了PETDet的有效性。定量分析表明,基于ResNet50的PETDet在多个FGOD数据集上达到最先进性能,包括FAIR1M-v1.0(42.96 AP)、FAIR1M-v2.0(48.81 AP)、MAR20(85.91 AP)和ShipRSImageNet(74.90 AP)。所提方法在准确率与推理速度之间实现了优越的兼容性。我们的代码与模型将在https://github.com/canoe-Z/PETDet开源。