Segmentation of ultra-high resolution (UHR) images is a critical task with numerous applications, yet it poses significant challenges due to high spatial resolution and rich fine details. Recent approaches adopt a dual-branch architecture, where a global branch learns long-range contextual information and a local branch captures fine details. However, they struggle to handle the conflict between global and local information while adding significant extra computational cost. Inspired by the human visual system's ability to rapidly orient attention to important areas with fine details and filter out irrelevant information, we propose a novel UHR segmentation method called Boundary-enhanced Patch-merging Transformer (BPT). BPT consists of two key components: (1) Patch-Merging Transformer (PMT) for dynamically allocating tokens to informative regions to acquire global and local representations, and (2) Boundary-Enhanced Module (BEM) that leverages boundary information to enrich fine details. Extensive experiments on multiple UHR image segmentation benchmarks demonstrate that our BPT outperforms previous state-of-the-art methods without introducing extra computational overhead. Codes will be released to facilitate research.
翻译:超高分辨率图像分割是一项具有广泛应用的关键任务,但由于其高空间分辨率和丰富的细节信息,带来了重大挑战。现有方法通常采用双分支架构,其中全局分支学习长距离上下文信息,局部分支捕捉精细细节。然而,这些方法难以处理全局与局部信息之间的冲突,同时引入了显著的计算开销。受人类视觉系统能够快速将注意力定向到具有精细细节的重要区域并过滤无关信息的能力启发,我们提出了一种新颖的超高分辨率分割方法——边界增强的块合并Transformer。该方法包含两个关键组件:(1) 用于动态分配token至信息丰富区域以获取全局和局部表示的块合并Transformer;(2) 利用边界信息增强细节特征的边界增强模块。在多个超高分辨率图像分割基准数据集上的大量实验表明,我们的方法在不引入额外计算开销的情况下,性能优于现有先进方法。代码将开源以促进相关研究。