Classification and localization are two main sub-tasks in object detection. Nonetheless, these two tasks have inconsistent preferences for feature context, i.e., localization expects more boundary-aware features to accurately regress the bounding box, while more semantic context is preferred for object classification. Exsiting methods usually leverage disentangled heads to learn different feature context for each task. However, the heads are still applied on the same input features, which leads to an imperfect balance between classifcation and localization. In this work, we propose a novel Task-Specific COntext DEcoupling (TSCODE) head which further disentangles the feature encoding for two tasks. For classification, we generate spatially-coarse but semantically-strong feature encoding. For localization, we provide high-resolution feature map containing more edge information to better regress object boundaries. TSCODE is plug-and-play and can be easily incorperated into existing detection pipelines. Extensive experiments demonstrate that our method stably improves different detectors by over 1.0 AP with less computational cost. Our code and models will be publicly released.
翻译:分类与定位是目标检测中的两个主要子任务。然而,这两个任务对特征上下文的需求存在不一致性:定位任务需要更多边界感知特征以精确回归边界框,而目标分类则更偏好语义上下文。现有方法通常采用解耦头部为每个任务学习不同的特征上下文,但这些头部仍应用于相同的输入特征,导致分类与定位之间无法实现完美平衡。本文提出一种新颖的面向特定任务的上下文解耦(TSCODE)头部,进一步将两个任务的特征编码进行解耦。针对分类任务,我们生成空间粗糙但语义性强的特征编码;针对定位任务,我们提供包含更多边缘信息的高分辨率特征图,以更好地回归目标边界。TSCODE具有即插即用特性,可轻松集成到现有检测框架中。大量实验表明,本方法能以更低的计算成本稳定提升不同检测器超过1.0 AP。我们的代码与模型将公开发布。