We study last-layer outlier dimensions, i.e. dimensions that display extreme activations for the majority of inputs. We show that outlier dimensions arise in many different modern language models, and trace their function back to the heuristic of constantly predicting frequent words. We further show how a model can block this heuristic when it is not contextually appropriate, by assigning a counterbalancing weight mass to the remaining dimensions, and we investigate which model parameters boost outlier dimensions and when they arise during training. We conclude that outlier dimensions are a specialized mechanism discovered by many distinct models to implement a useful token prediction heuristic.
翻译:本研究探讨了最后一层的离群维度,即对大多数输入呈现极端激活值的维度。我们发现离群维度广泛存在于多种现代语言模型中,并将其功能追溯至持续预测高频词汇的启发式策略。进一步研究表明,模型可通过在其余维度分配抵消性的权重质量来阻止该启发式策略在上下文不适用时生效,同时探究了哪些模型参数会增强离群维度及其在训练过程中的形成时机。本文结论表明:离群维度是多种不同模型为实现有效的词元预测启发式策略而发现的一种专用机制。