We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1's mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.
翻译:我们提出群交叉编码器,作为交叉编码器的扩展,旨在系统性地发现和分析神经网络中的对称特征。尽管神经网络常在无显式架构约束下发展出等变表示,但理解这些涌现的对称性传统上依赖人工分析。群交叉编码器通过在对称群作用下对输入的变换版本进行字典学习,实现了该过程的自动化。将我们的方法应用于InceptionV1的mixed3b层(使用二面体群$\mathrm{D}_{32}$),获得了若干关键发现:首先,该方法能自然地将特征聚类为可解释的族系,这些族系对应先前假设的特征类型,且比标准稀疏自编码器提供更精确的分离度。其次,我们的变换块分析实现了特征对称性的自动表征,揭示了不同几何特征(如曲线与直线)如何展现独特的不变性与等变性模式。这些结果表明,群交叉编码器能够为神经网络如何表示对称性提供系统性见解,为机制可解释性研究提供了前景广阔的新工具。