Semantic Sections: An Atlas-Native Feature Ontology for Obstructed Representation Spaces

Recent interpretability work often treats a feature as a single global direction, dictionary atom, or latent coordinate shared across contexts. We argue that this ontology can fail in obstructed representation spaces, where locally coherent meanings need not assemble into one globally consistent feature. We introduce an atlas-native replacement object, the semantic section: a transport-compatible family of local feature representatives defined over a context atlas. We formalize semantic sections, prove that tree-supported propagation is always pathwise realizable, and show that cycle consistency is the key criterion for genuine globalization. This yields a distinction between tree-local, globalizable, and twisted sections, with twisted sections capturing locally coherent but holonomy-obstructed meanings. We then develop a discovery-and-certification pipeline based on seeded propagation, synchronization across overlaps, defect-based pruning, cycle-aware taxonomy, and deduplication. Across layer-16 atlases for Llama 3.2 3B Instruct, Qwen 2.5 3B Instruct, and Gemma 2 2B IT, we find nontrivial populations of semantic sections, including cycle-supported globalizable and twisted regimes after deduplication. Most importantly, semantic identity is not recovered by raw global-vector similarity. Even certified globalizable sections show low cross-chart signed cosine similarity, and raw similarity baselines recover only a small fraction of true within-section pairs, often collapsing at moderate thresholds. By contrast, section-based identity recovery is perfect on certified supports. These results support semantic sections as a better feature ontology in obstructed regimes.

翻译：近期可解释性工作通常将特征视为跨上下文共享的单一全局方向、字典原子或潜在坐标。本文论证该本体论在阻碍表示空间中可能失效——在该空间中，局部一致的含义未必能整合为全局统一的特征。我们引入一种图册原生的替代对象——语义分段：定义在上下文图册上的可迁移局部特征代表族。本文对语义分段进行形式化，证明基于树的传播总是路径可实现的，并揭示循环一致性是实现真正全局化的关键判据。由此区分出树局部化、可全局化与扭曲分段，其中扭曲分段捕捉了局部一致但受完整约束阻碍的含义。我们进一步开发了基于种子传播、重叠同步、缺陷剪枝、循环感知分类与去重技术的发现-验证流水线。在Llama 3.2 3B Instruct、Qwen 2.5 3B Instruct与Gemma 2 2B IT的第16层图册中，我们发现了非平凡的语义分段群体，包括去重后由循环支持的全局化与扭曲模式。更关键的是，语义同一性无法通过原始全局向量相似度恢复。即使经过验证的可全局化分段仍呈现低跨图表带符号余弦相似度，而原始相似度基线仅能恢复真实分段内配对的极小部分，且常在中等等效阈值处崩溃。相比之下，基于分段的同一性恢复在验证支持集上表现完美。这些结果支持语义分段作为阻碍表示空间中更优的特征本体论。