Language models perform well on grammatical agreement, but it is unclear whether this reflects rule-based generalization or memorization. We study this question for German definite singular articles, whose forms depend on gender and case. Using GRADIEND, a gradient-based interpretability method, we learn parameter update directions for gender-case specific article transitions. We find that updates learned for a specific gender-case article transition frequently affect unrelated gender-case settings, with substantial overlap among the most affected neurons across settings. These results argue against a strictly rule-based encoding of German definite articles, indicating that models at least partly rely on memorized associations rather than abstract grammatical rules.
翻译:语言模型在语法一致性任务上表现良好,但尚不清楚这反映的是基于规则的泛化能力还是记忆效应。本研究以德语单数定冠词为对象展开探讨,其形式取决于名词的性和格。通过采用基于梯度的可解释性方法GRADIEND,我们学习了针对特定性-格组合的冠词转换所需的参数更新方向。研究发现,为特定性-格冠词转换学习的参数更新,常会影响到无关的性-格配置,且不同配置间受影响最显著的神经元存在大量重叠。这些结果表明模型并未采用严格的基于规则编码方式处理德语定冠词,意味着模型至少部分依赖于记忆关联而非抽象语法规则。