Camouflaged objects are typically assimilated into their backgrounds and exhibit fuzzy boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their surroundings pose significant challenges in accurately locating and segmenting these objects in their entirety. While existing methods have demonstrated remarkable performance in various real-world scenarios, they still face limitations when confronted with difficult cases, such as small targets, thin structures, and indistinct boundaries. Drawing inspiration from human visual perception when observing images containing camouflaged objects, we propose a three-stage model that enables coarse-to-fine segmentation in a single iteration. Specifically, our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features. This proposed approach not only reduces computational overhead but also mitigates interference caused by background noise. Furthermore, considering the significance of multi-scale information, we have designed a multi-scale feature enhancement module that enlarges the receptive field while preserving detailed structural cues. Additionally, a boundary enhancement module has been developed to enhance performance by leveraging boundary information. Subsequently, a mask-guided fusion module is proposed to generate fine-grained results by integrating coarse prediction maps with high-resolution feature maps. Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities. Upon acceptance of the paper, the source code will be made publicly available at https://github.com/clelouch/BTSNet.
翻译:伪装目标通常与背景融为一体,呈现模糊的边界。复杂的环境条件以及伪装目标与周围环境的高度内在相似性,对完整且精确地定位和分割这些目标构成了重大挑战。尽管现有方法在多种现实场景中已展现出卓越性能,但在面对小目标、细长结构及模糊边界等困难案例时仍存在局限性。受人类观察含有伪装目标的图像时视觉感知机制的启发,本文提出了一种三阶段模型,可在单次迭代中实现由粗到精的分割。具体而言,该模型采用三个解码器依次处理下采样特征、裁剪特征和高分辨率原始特征。这种设计不仅降低了计算开销,还有效抑制了背景噪声的干扰。此外,考虑到多尺度信息的重要性,我们设计了一个多尺度特征增强模块,在扩大感受野的同时保留细节结构线索。同时,通过利用边界信息开发了一个边界增强模块以提升性能。随后,提出了一种掩码引导融合模块,通过整合粗预测图与高分辨率特征图生成精细结果。无需引入不必要的复杂性,我们的网络即可超越当前基于CNN的最优方法。论文接收后,源代码将在https://github.com/clelouch/BTSNet 公开发布。