补丁-深度融合：基于细粒度补丁策略与深度完整性先验的二值图像分割 (Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior)

Dichotomous Image Segmentation (DIS) is a high-precision object segmentation task for high-resolution natural images. The current mainstream methods focus on the optimization of local details but overlook the fundamental challenge of modeling the integrity of objects. We have found that the depth integrity-prior implicit in the the pseudo-depth maps generated by Depth Anything Model v2 and the local detail features of image patches can jointly address the above dilemmas. Based on the above findings, we have designed a novel Patch-Depth Fusion Network (PDFNet) for high-precision dichotomous image segmentation. The core of PDFNet consists of three aspects. Firstly, the object perception is enhanced through multi-modal input fusion. By utilizing the patch fine-grained strategy, coupled with patch selection and enhancement, the sensitivity to details is improved. Secondly, by leveraging the depth integrity-prior distributed in the depth maps, we propose an integrity-prior loss to enhance the uniformity of the segmentation results in the depth maps. Finally, we utilize the features of the shared encoder and, through a simple depth refinement decoder, improve the ability of the shared encoder to capture subtle depth-related information in the images. Experiments on the DIS-5K dataset show that PDFNet significantly outperforms state-of-the-art non-diffusion methods. Due to the incorporation of the depth integrity-prior, PDFNet achieves or even surpassing the performance of the latest diffusion-based methods while using less than 11% of the parameters of diffusion-based methods. The source code at https://github.com/Tennine2077/PDFNet

翻译：二值图像分割（DIS）是针对高分辨率自然图像的高精度目标分割任务。当前主流方法侧重于局部细节的优化，却忽视了建模目标完整性的根本挑战。我们发现，由Depth Anything Model v2生成的伪深度图中隐含的深度完整性先验与图像补丁的局部细节特征能够共同解决上述困境。基于上述发现，我们设计了一种新颖的补丁-深度融合网络（PDFNet）用于高精度二值图像分割。PDFNet的核心包含三个方面。首先，通过多模态输入融合增强目标感知能力。利用补丁细粒度策略，结合补丁选择与增强机制，提升对细节的敏感度。其次，通过利用深度图中分布的深度完整性先验，我们提出了一种完整性先验损失函数，以增强深度图中分割结果的均匀性。最后，我们利用共享编码器的特征，并通过一个简单的深度细化解码器，提升共享编码器捕捉图像中细微深度相关信息的能力。在DIS-5K数据集上的实验表明，PDFNet显著优于当前最先进的非扩散方法。由于引入了深度完整性先验，PDFNet在参数量不足基于扩散方法11%的情况下，达到甚至超越了最新基于扩散方法的性能。源代码位于 https://github.com/Tennine2077/PDFNet