End-to-end region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages, which refine the current predictions according to their previous results. Model parameters within each stage are independent, evolving a huge cost. In this paper, we find the general setting of decoding stages is actually redundant. By simply sharing parameters and making a recursive decoder, the detector already obtains a significant improvement. The recursive decoder can be further enhanced by positional encoding (PE) of the proposal box, which makes it aware of the exact locations and sizes of input bounding boxes, thus becoming adaptive to proposals from different stages during the recursion. Moreover, we also design centerness-based PE to distinguish the RoI feature element and dynamic convolution kernels at different positions within the bounding box. To validate the effectiveness of the proposed method, we conduct intensive ablations and build the full model on three recent mainstream region-based detectors. The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters and slightly increased computation cost. Codes are available at https://github.com/bravezzzzzz/RecursiveDet.
翻译:端到端基于区域的目标检测器(如Sparse R-CNN)通常采用多个级联的边界框解码阶段,根据前一轮结果逐步优化当前预测。每个阶段的模型参数相互独立,导致巨大的参数开销。本文发现,解码阶段的通用设置实际上存在冗余。通过简单共享参数并采用递归解码器,检测器即可获得显著性能提升。递归解码器可通过提议框的位置编码进一步增强,使其能感知输入边界框的具体位置与尺寸,从而在递归过程中适应不同阶段的提议。此外,我们还设计了基于中心度的位置编码,以区分感兴趣区域特征元素及边界框内不同位置的动态卷积核。为验证所提方法的有效性,我们进行了深入消融实验,并在三种近期主流基于区域检测器上构建完整模型。RecursiveDet能在减少模型参数且仅略微增加计算成本的情况下实现明显性能提升。代码已开源至 https://github.com/bravezzzzzz/RecursiveDet。