LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention

Feature upsampling is an essential operation in constructing deep convolutional neural networks. However, existing upsamplers either lack specific feature guidance or necessitate the utilization of high-resolution feature maps, resulting in a loss of performance and flexibility. In this paper, we find that the local self-attention naturally has the feature guidance capability, and its computational paradigm aligns closely with the essence of feature upsampling (\ie feature reassembly of neighboring points). Therefore, we introduce local self-attention into the upsampling task and demonstrate that the majority of existing upsamplers can be regarded as special cases of upsamplers based on local self-attention. Considering the potential semantic gap between upsampled points and their neighboring points, we further introduce the deformation mechanism into the upsampler based on local self-attention, thereby proposing LDA-AQU. As a novel dynamic kernel-based upsampler, LDA-AQU utilizes the feature of queries to guide the model in adaptively adjusting the position and aggregation weight of neighboring points, thereby meeting the upsampling requirements across various complex scenarios. In addition, LDA-AQU is lightweight and can be easily integrated into various model architectures. We evaluate the effectiveness of LDA-AQU across four dense prediction tasks: object detection, instance segmentation, panoptic segmentation, and semantic segmentation. LDA-AQU consistently outperforms previous state-of-the-art upsamplers, achieving performance enhancements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU compared to the baseline models in the aforementioned four tasks, respectively. Code is available at \url{https://github.com/duzw9311/LDA-AQU}.

翻译：特征上采样是构建深度卷积神经网络的关键操作。然而，现有的上采样方法要么缺乏特定的特征引导，要么必须依赖高分辨率特征图，导致性能与灵活性受限。本文发现局部自注意力机制天然具备特征引导能力，其计算范式与特征上采样的本质（即相邻点的特征重组）高度契合。因此，我们将局部自注意力引入上采样任务，并证明大多数现有上采样器可视为基于局部自注意力的上采样器的特例。考虑到上采样点与其邻域点之间可能存在的语义鸿沟，我们进一步将形变机制引入基于局部自注意力的上采样器，从而提出LDA-AQU。作为一种新型的基于动态核的上采样器，LDA-AQU利用查询特征引导模型自适应调整邻域点的位置与聚合权重，从而满足各类复杂场景下的上采样需求。此外，LDA-AQU具有轻量级特性，可轻松集成到多种模型架构中。我们在四个密集预测任务上评估LDA-AQU的有效性：目标检测、实例分割、全景分割与语义分割。相较于基线模型，LDA-AQU在上述四个任务中分别实现了1.7 AP、1.5 AP、2.0 PQ和2.5 mIoU的性能提升，持续优于现有最先进的上采样方法。代码发布于\url{https://github.com/duzw9311/LDA-AQU}。