Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language models (MLMs and ALMs), we find that STII increases within idiomatic expressions and that MLMs scale STII with syntactic distance, relying more on syntax in their nonlinear structure than ALMs do. Our speech model findings reflect the phonetic principal that the openness of the oral cavity determines how much a phoneme varies based on its context. Finally, we study image classifiers and illustrate that feature interactions intuitively reflect object boundaries. Our wide range of results illustrates the benefits of interdisciplinary work and domain expertise in interpretability research.
翻译:测量非线性特征交互是理解多种模型中复杂归因模式的成熟方法。本文利用沙普利泰勒交互指数(STII)分析不同模态、任务和架构中底层数据结构对模型表示的影响。针对掩码语言模型(MLM)和自回归语言模型(ALM)中的语言结构,我们发现STII在习语表达中升高,且MLM的STII随句法距离呈比例变化,其非线性结构对句法的依赖程度高于ALM。语音模型的结果证实了口腔开度决定音素随语境变异程度这一语音学原理。最后,我们研究图像分类器,发现特征交互直观反映物体边界。上述广泛结果表明跨学科工作与领域专业知识在可解释性研究中的重要作用。