Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base models with additional code data. We extend this analysis to additionally examine the effect of math training, another highly structured data type, and alignment tuning, an important step for enhancing the usability of models. We find clear evidence that models additionally trained on large amounts of code outperform the base models. On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.
翻译:近期研究间接表明,在代码数据上预训练语言模型能够提升模型追踪自然语言中话语实体状态变化的能力。本研究通过对比配对语言模型的实体追踪性能,系统性地验证了这一论断。关键之处在于,这些配对模型由基础模型与在其基础上通过额外代码数据训练的模型构成。我们将此分析进一步扩展至考察数学训练(另一种高度结构化的数据类型)和对齐调优(提升模型可用性的关键步骤)的影响。研究明确发现,经过大量代码数据额外训练的模型表现优于基础模型。另一方面,我们发现额外数学训练或对齐调优在不同模型族中并未带来一致的性能提升。