Prior work has shown the existence of contextual neurons in language models, including a neuron that activates on German text. We show that this neuron exists within a broader contextual n-gram circuit: we find late layer neurons which recognize and continue n-grams common in German text, but which only activate if the German neuron is active. We investigate the formation of this circuit throughout training and find that it is an example of what we call a second-order circuit. In particular, both the constituent n-gram circuits and the German detection circuit which culminates in the German neuron form with independent functions early in training - the German detection circuit partially through modeling German unigram statistics, and the n-grams by boosting appropriate completions. Only after both circuits have already formed do they fit together into a second-order circuit. Contrary to the hypotheses presented in prior work, we find that the contextual n-gram circuit forms gradually rather than in a sudden phase transition. We further present a range of anomalous observations such as a simultaneous phase transition in many tasks coinciding with the learning rate warm-up, and evidence that many context neurons form simultaneously early in training but are later unlearned.
翻译:先前研究已证明语言模型中存在上下文神经元,包括一个针对德文文本激活的神经元。我们发现该神经元存在于更广泛的上下文n-gram回路中:在深层神经元中发现了能够识别并延续德文文本中常见n-gram的神经元,但这些神经元仅在德文神经元激活时才工作。我们通过追踪训练过程中该回路的形成,发现其属于我们称之为二阶回路的典型实例。具体而言,构成该回路的n-gram子回路与最终汇聚于德文神经元的德文检测回路,在训练早期各自独立形成功能——德文检测回路部分通过建模德文一元统计特征,而n-gram回路则通过增强合适补全能力。只有当两个子回路均已形成后,它们才融合为二阶回路。与先前研究假设相反,我们发现上下文n-gram回路是渐进形成而非通过突然的相变实现的。我们进一步揭示了异常现象,例如在学习率预热期间多个任务同步出现相变,以及许多上下文神经元的证据显示其在训练早期同步形成但后期被遗忘。