We investigate the integration of human-like working memory constraints into the Transformer architecture and implement several cognitively inspired attention variants, including fixed-width windows based and temporal decay based attention mechanisms. Our modified GPT-2 models are trained from scratch on developmentally plausible datasets (10M and 100M words). Performance is evaluated on grammatical judgment tasks (BLiMP) and alignment with human reading time data. Our results indicate that these cognitively-inspired constraints, particularly fixed-width attention, can significantly improve grammatical accuracy especially when training data is scarce. These constrained models also tend to show a stronger alignment with human processing metrics. The findings suggest that such constraints may serve as a beneficial inductive bias, guiding models towards more robust linguistic representations, especially in data-limited settings.
翻译:我们研究了将类人工作记忆约束整合到Transformer架构中的方法,并实现了多种受认知启发的注意力变体,包括基于固定宽度窗口和基于时间衰减的注意力机制。我们从头开始在符合发展心理学特征的数据集(1000万词和1亿词)上训练了改进后的GPT-2模型。通过在语法判断任务(BLiMP)以及与人眼阅读时间数据的对齐性上评估模型表现,结果表明这些受认知启发的约束(特别是固定宽度注意力)能够显著提升语法准确性,尤其在训练数据稀缺时表现突出。这些受约束模型通常也展现出与人类加工指标更强的对齐性。研究结论表明,此类约束可能作为一种有益的归纳偏置,在数据有限条件下引导模型形成更稳健的语言表征。