Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.
翻译:诸如Transformer等深度架构有时因具有不可解释的“黑箱”表征而受到批评。我们采用因果干预分析表明,实际上某些语言特征是以线性、可解释的形式表征的。具体而言,我们显示BERT的动词变位能力依赖于主语数量的线性编码,这种编码可通过可预测的方式影响变位准确率。该编码在第一层的主语位置和最后一层的动词位置均存在,但在中间层则分布在不同位置,尤其当存在多个主语数量线索时尤为显著。