Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures. We propose \textbf{GASeg}, a novel framework that bridges appearance and geometry by leveraging stable topological information. The core of our method is Differentiable Box-Counting (\textbf{DBC}) module, which quantifies multi-scale topological statistics from two parallel streams: geometric-based features and appearance-based features. To force the model to learn these stable structural representations, we introduce Topological Augmentation (\textbf{TopoAug}), an adversarial strategy that simulates real-world ambiguities by applying morphological operators to the input images. A multi-objective loss, \textbf{GALoss}, then explicitly enforces cross-modal alignment between geometric-based and appearance-based features. Extensive experiments demonstrate that GASeg achieves state-of-the-art performance on four benchmarks, including COCO-Stuff, Cityscapes, and PASCAL, validating our approach of bridging geometry and appearance via topological information.
翻译:自监督语义分割方法在面对外观模糊性时常常失效。我们认为这是由于过度依赖不稳定、基于外观的特征(如阴影、眩光和局部纹理)所致。本文提出 \textbf{GASeg},一种通过利用稳定拓扑信息来连接外观与几何的新型框架。我们方法的核心是可微盒计数(\textbf{DBC})模块,该模块从两个并行流中量化多尺度拓扑统计量:基于几何的特征和基于外观的特征。为了迫使模型学习这些稳定的结构表示,我们引入了拓扑增强(\textbf{TopoAug}),这是一种通过对输入图像应用形态学算子来模拟现实世界模糊性的对抗策略。随后,一个多目标损失函数 \textbf{GALoss} 显式地强制基于几何的特征与基于外观的特征之间进行跨模态对齐。大量实验表明,GASeg 在四个基准测试(包括 COCO-Stuff、Cityscapes 和 PASCAL)上实现了最先进的性能,验证了我们通过拓扑信息连接几何与外观的方法的有效性。