Prior work has found that pretrained language models (LMs) fine-tuned with different random seeds can achieve similar in-domain performance but generalize differently on tests of syntactic generalization. In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. To better understand these phenomena, we investigate if they can be understood in terms of "competing subnetworks": the model initially represents a variety of distinct algorithms, corresponding to different subnetworks, and generalization occurs when it ultimately converges to one. This explanation has been used to account for generalization in simple algorithmic tasks. Instead of finding competing subnetworks, we find that all subnetworks -- whether they generalize or not -- share a set of attention heads, which we refer to as the heuristic core. Further analysis suggests that these attention heads emerge early in training and compute shallow, non-generalizing features. The model learns to generalize by incorporating additional attention heads, which depend on the outputs of the "heuristic" heads to compute higher-level features. Overall, our results offer a more detailed picture of the mechanisms for syntactic generalization in pretrained LMs.
翻译:先前研究发现,使用不同随机种子微调的预训练语言模型(LM)可取得相似的域内性能,但在句法泛化测试中却表现出差异。本研究揭示,即使在单个模型内部,我们也能发现多个域内性能相似但泛化能力截然不同的子网络。为深入理解这一现象,我们探究能否通过"竞争性子网络"理论进行解释:模型初始阶段表征多种对应不同子网络的独立算法,最终收敛至某一算法时便实现泛化。该解释曾用于说明简单算法任务中的泛化现象。然而我们并未发现竞争性子网络,而是发现所有子网络——无论是否具备泛化能力——都共享一组我们称之为"核心启发式"的注意力头。进一步分析表明,这些注意力头在训练早期便已出现,并计算浅层非泛化特征。模型通过整合依赖"启发式"注意力头输出进行高阶特征计算的附加注意力头来学习泛化。总体而言,我们的研究结果为预训练语言模型的句法泛化机制提供了更精细的图景。