The Backpack is a Transformer alternative shown to improve interpretability in English language modeling by decomposing predictions into a weighted sum of token sense components. However, Backpacks' reliance on token-defined meaning raises questions as to their potential for languages other than English, a language for which subword tokenization provides a reasonable approximation for lexical items. In this work, we train, evaluate, interpret, and control Backpack language models in character-tokenized Chinese, in which words are often composed of many characters. We find that our (134M parameter) Chinese Backpack language model performs comparably to a (104M parameter) Transformer, and learns rich character-level meanings that log-additively compose to form word meanings. In SimLex-style lexical semantic evaluations, simple averages of Backpack character senses outperform input embeddings from a Transformer. We find that complex multi-character meanings are often formed by using the same per-character sense weights consistently across context. Exploring interpretability-through control, we show that we can localize a source of gender bias in our Backpacks to specific character senses and intervene to reduce the bias.
翻译:Backpack是Transformer的一种替代方案,通过将预测分解为词元语义分量的加权和,在英语语言建模中提升了可解释性。然而,Backpack依赖于词元定义的意义,这引发了对英语之外语言适用性的疑问——在英语中,子词分词能合理近似词汇单元。本研究在字符分词的中文环境下训练、评估、解释并控制Backpack语言模型,其中词语常由多个字符组成。我们发现,(1.34亿参数的)中文Backpack语言模型性能可与(1.04亿参数的)Transformer相媲美,并学习到丰富的字符级语义,这些语义通过对数加法组合形成词语含义。在SimLex风格的词汇语义评估中,Backpack字符语义的简单平均值优于Transformer的输入嵌入。我们观察到,复杂的多字符含义通常通过跨上下文一致使用相同的逐字符语义权重形成。通过可解释性控制,我们证明可将Backpack中的性别偏见来源定位至特定字符语义,并通过干预减少该偏见。