面向任意质量图像分割的生成式自适应潜在空间增强方法 (Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement)

Segment Anything Models (SAMs), known for their exceptional zero-shot segmentation performance, have garnered significant attention in the research community. Nevertheless, their performance drops significantly on severely degraded, low-quality images, limiting their effectiveness in real-world scenarios. To address this, we propose GleSAM++, which utilizes Generative Latent space Enhancement to boost robustness on low-quality images, thus enabling generalization across various image qualities. Additionally, to improve compatibility between the pre-trained diffusion model and the segmentation framework, we introduce two techniques, i.e., Feature Distribution Alignment (FDA) and Channel Replication and Expansion (CRE). However, the above components lack explicit guidance regarding the degree of degradation. The model is forced to implicitly fit a complex noise distribution that spans conditions from mild noise to severe artifacts, which substantially increases the learning burden and leads to suboptimal reconstructions. To address this issue, we further introduce a Degradation-aware Adaptive Enhancement (DAE) mechanism. The key principle of DAE is to decouple the reconstruction process for arbitrary-quality features into two stages: degradation-level prediction and degradation-aware reconstruction. Our method can be applied to pre-trained SAM and SAM2 with only minimal additional learnable parameters, allowing for efficient optimization. Extensive experiments demonstrate that GleSAM++ significantly improves segmentation robustness on complex degradations while maintaining generalization to clear images. Furthermore, GleSAM++ also performs well on unseen degradations, underscoring the versatility of our approach and dataset.

翻译：Segment Anything Models (SAMs)以其卓越的零样本分割性能在学术界引起了广泛关注。然而，在严重退化、低质量的图像上，其性能会显著下降，这限制了其在真实场景中的有效性。为解决此问题，我们提出了GleSAM++，该方法利用生成式潜在空间增强来提升模型在低质量图像上的鲁棒性，从而实现跨不同图像质量的泛化。此外，为提高预训练扩散模型与分割框架之间的兼容性，我们引入了两种技术：特征分布对齐（FDA）以及通道复制与扩展（CRE）。然而，上述组件缺乏关于退化程度的显式指导。模型被迫隐式拟合一个从轻微噪声到严重伪影的复杂噪声分布，这显著增加了学习负担并导致次优重建。为解决此问题，我们进一步引入了退化感知自适应增强（DAE）机制。DAE的核心原理是将任意质量特征的重建过程解耦为两个阶段：退化程度预测和退化感知重建。我们的方法仅需添加极少量可学习参数即可应用于预训练的SAM和SAM2模型，从而实现高效优化。大量实验表明，GleSAM++在复杂退化条件下显著提升了分割鲁棒性，同时保持了对清晰图像的泛化能力。此外，GleSAM++在未见过的退化类型上也表现良好，这凸显了我们方法及数据集的普适性。