High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes. The main challenge for DIS involves identifying the highly accurate dominant area while rendering detailed object structure. However, directly using a general encoder-decoder architecture may result in an oversupply of high-level features and neglect the shallow spatial information necessary for partitioning meticulous structures. To fill this gap, we introduce a novel Unite-Divide-Unite Network (UDUN} that restructures and bipartitely arranges complementary features to simultaneously boost the effectiveness of trunk and structure identification. The proposed UDUN proceeds from several strengths. First, a dual-size input feeds into the shared backbone to produce more holistic and detailed features while keeping the model lightweight. Second, a simple Divide-and-Conquer Module (DCM) is proposed to decouple multiscale low- and high-level features into our structure decoder and trunk decoder to obtain structure and trunk information respectively. Moreover, we design a Trunk-Structure Aggregation module (TSA) in our union decoder that performs cascade integration for uniform high-accuracy segmentation. As a result, UDUN performs favorably against state-of-the-art competitors in all six evaluation metrics on overall DIS-TE, i.e., achieving 0.772 weighted F-measure and 977 HCE. Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
翻译:高精度二分图像分割(DIS)旨在从自然场景中精准定位与类别无关的前景对象。其核心挑战在于同时识别高精度的主导区域并呈现精细的物体结构。然而,直接使用通用的编码器-解码器架构可能导致高层特征过剩,并忽略分割精细结构所需的浅层空间信息。为弥补这一不足,我们提出了一种新颖的统一-分割-统一网络(UDUN),该网络通过重构互补特征并对其进行二分排列,以同时提升主干和结构识别的有效性。所提出的UDUN具有多项优势。首先,采用双尺寸输入馈入共享主干网络,在保持模型轻量化的同时产生更全面和精细的特征。其次,提出了一种简单的分治模块(DCM),将多尺度低层和高层特征解耦至我们的结构解码器和主干解码器,分别获取结构和主干信息。此外,我们在联合解码器中设计了主干-结构聚合模块(TSA),通过级联整合实现统一的高精度分割。实验结果表明,UDUN在DIS-TE数据集的所有六项评估指标上均优于当前最先进方法,取得了0.772的加权F值和977的HCE值。在1024×1024输入下,基于ResNet-18的模型可实现65.3帧/秒的实时推理。