SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation

While large language models provide strong compositional reasoning, existing reasoning segmentation pipelines fail to transparently connect this reasoning to visual perception. Current methods, such as latent query alignment, are end-to-end yet opaque "black boxes". Conversely, textual localization readout is merely readable, not truly interpretable, often functioning as an unconstrained post-hoc step. To bridge this interpretability gap, we propose SegCompass, an end-to-end model that leverages a Sparse Autoencoder (SAE) to forge an explicit, interpretable, and differentiable alignment pathway. Given an image-instruction pair, SegCompass first generates a chain-of-thought (CoT) trace. The core of our method is an SAE that maps both the CoT and visual tokens into a shared, high-dimensional sparse concept space. A query codebook selects salient concepts from this space, which are then spatially grounded by a slot mapper into a multi-slot heatmap that guides the final mask decoder. The entire model is trained jointly, unifying reinforcement learning for the reasoning path with standard segmentation supervision. This SAE-driven interface provides a "white-box" connection that is significantly more traceable than latent queries and more coherent than textual readouts. Extensive experiments on five challenging benchmarks demonstrate that SegCompass matches or surpasses state-of-the-art performance. Crucially, our visual and quantitative analyses show a strong correlation between the quality of the learned sparse concepts and final mask accuracy, confirming that SegCompass achieves superior results through its enhanced and inspectable alignment. Code is available at https://github.com/ZhenyuLU-Heliodore/SegCompass.

翻译：尽管大型语言模型具备强大的组合推理能力，但现有的推理分割流程无法透明地将这种推理与视觉感知相连接。当前方法（如潜在查询对齐）是端到端但却是不透明的“黑盒”。相反，文本定位读取仅可阅读却非真正可解释，它常常作为一个无约束的后处理步骤。为弥合这一可解释性差距，我们提出了SegCompass，一个端到端模型，它利用稀疏自编码器（SAE）来构建一条显式、可解释且可微分的对齐路径。给定一个图像-指令对，SegCompass首先生成一条思维链（CoT）轨迹。我们方法的核心是一个SAE，它将CoT和视觉标记映射到一个共享的高维稀疏概念空间中。一个查询码书从该空间中选择显著概念，然后通过一个槽映射器将这些概念空间定位到多槽热力图中，以引导最终的掩码解码器。整个模型联合训练，将强化学习用于推理路径与标准分割监督统一起来。这种SAE驱动的接口提供了一种“白盒”连接，比潜在查询更可追踪，比文本读取更连贯。在五个挑战性基准上的大量实验表明，SegCompass达到或超越了最先进的性能。至关重要的是，我们的视觉和定量分析显示，所学稀疏概念的质量与最终掩码精度之间存在强相关性，这证实了SegCompass通过其增强且可检查的对齐实现了优越的结果。代码可在 https://github.com/ZhenyuLU-Heliodore/SegCompass 获取。