A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

Feature upsampling is a fundamental and indispensable ingredient of almost all current network structures for image segmentation tasks. Recently, a popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance to help upsample the low-resolution deep feature based on their local similarity. Albeit achieving promising performance, this pipeline has specific limitations: 1) HR query and LR key features are not well aligned; 2) the similarity between query-key features is computed based on the fixed inner product form; 3) neighbor selection is coarsely operated on LR features, resulting in mosaic artifacts. These shortcomings make the existing methods along this pipeline primarily applicable to hierarchical network architectures with iterative features as guidance and they are not readily extended to a broader range of structures, especially for a direct high-ratio upsampling. Against the issues, we meticulously optimize every methodological design. Specifically, we firstly propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives, and then construct a parameterized paired central difference convolution block for flexibly calculating the similarity between the well-aligned query-key features. Besides, we develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts. Based on these careful designs, we systematically construct a refreshed similarity-based feature upsampling framework named ReSFU. Extensive experiments substantiate that our proposed ReSFU is finely applicable to various types of architectures in a direct high-ratio upsampling manner, and consistently achieves satisfactory performance on different segmentation applications, showing superior generality and ease of deployment.

翻译：特征上采样是当前几乎所有图像分割任务网络架构的基本且不可或缺的组成部分。最近，一种流行的基于相似性的特征上采样流程被提出，该流程利用高分辨率特征作为引导，基于其局部相似性来帮助上采样低分辨率深度特征。尽管取得了良好的性能，但该流程存在特定局限性：1）高分辨率查询特征与低分辨率关键特征未得到良好对齐；2）查询-关键特征之间的相似性基于固定的内积形式计算；3）邻域选择在低分辨率特征上粗略进行，导致马赛克伪影。这些缺点使得沿此流程的现有方法主要适用于具有迭代特征作为引导的分层网络架构，且不易扩展到更广泛的结构，特别是直接高倍率上采样场景。针对这些问题，我们细致地优化了每个方法设计。具体而言，我们首先从语义感知和细节感知两个角度提出了一种显式可控的查询-关键特征对齐方法，然后构建了一个参数化的成对中心差分卷积块，用于灵活计算已对齐的查询-关键特征之间的相似性。此外，我们在高分辨率特征上开发了一种细粒度的邻域选择策略，该策略简单而有效，能够减轻马赛克伪影。基于这些精心设计，我们系统性地构建了一个名为ReSFU的新型基于相似性的特征上采样框架。大量实验证实，我们提出的ReSFU能够以直接高倍率上采样的方式精细地适用于各类架构，并在不同的分割应用中持续取得令人满意的性能，展现出优异的通用性和易于部署的特点。