群交叉编码器：对称性机制分析的新方法 (Group Crosscoders for Mechanistic Analysis of Symmetry)

We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1's mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.

翻译：我们提出群交叉编码器，作为交叉编码器的扩展，旨在系统性地发现和分析神经网络中的对称特征。尽管神经网络常在无显式架构约束下发展出等变表示，但理解这些涌现的对称性传统上依赖人工分析。群交叉编码器通过在对称群作用下对输入的变换版本进行字典学习，实现了该过程的自动化。将我们的方法应用于InceptionV1的mixed3b层（使用二面体群$\mathrm{D}_{32}$），获得了若干关键发现：首先，该方法能自然地将特征聚类为可解释的族系，这些族系对应先前假设的特征类型，且比标准稀疏自编码器提供更精确的分离度。其次，我们的变换块分析实现了特征对称性的自动表征，揭示了不同几何特征（如曲线与直线）如何展现独特的不变性与等变性模式。这些结果表明，群交叉编码器能够为神经网络如何表示对称性提供系统性见解，为机制可解释性研究提供了前景广阔的新工具。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日