Investigating grammatical abstraction in language models using few-shot learning of novel noun gender

Humans can learn a new word and infer its grammatical properties from very few examples. They have an abstract notion of linguistic properties like grammatical gender and agreement rules that can be applied to novel syntactic contexts and words. Drawing inspiration from psycholinguistics, we conduct a noun learning experiment to assess whether an LSTM and a decoder-only transformer can achieve human-like abstraction of grammatical gender in French. Language models were tasked with learning the gender of a novel noun embedding from a few examples in one grammatical agreement context and predicting agreement in another, unseen context. We find that both language models effectively generalise novel noun gender from one to two learning examples and apply the learnt gender across agreement contexts, albeit with a bias for the masculine gender category. Importantly, the few-shot updates were only applied to the embedding layers, demonstrating that models encode sufficient gender information within the word embedding space. While the generalisation behaviour of models suggests that they represent grammatical gender as an abstract category, like humans, further work is needed to explore the details of how exactly this is implemented. For a comparative perspective with human behaviour, we conducted an analogous one-shot novel noun gender learning experiment, which revealed that native French speakers, like language models, also exhibited a masculine gender bias and are not excellent one-shot learners either.

翻译：人类能够从极少的例子中学习一个新词并推断其语法属性。他们对语法性别和一致性规则等语言属性具有抽象的认知，这些规则可以应用于新的句法环境和词汇。借鉴心理语言学的思路，我们开展了一项名词学习实验，以评估LSTM和仅解码器Transformer模型是否能在法语语法性别上实现类人抽象能力。语言模型的任务是从一个语法一致性上下文中的少量示例学习新名词嵌入的性别，并在另一个未见过的上下文中预测其一致性。我们发现，两种语言模型都能有效地从一到两个学习示例中泛化新名词性别，并将学到的性别应用于不同的一致性上下文，尽管存在对阳性性别类别的偏好。重要的是，这些小样本更新仅应用于嵌入层，这表明模型在词嵌入空间内编码了足够的性别信息。虽然模型的泛化行为表明它们像人类一样将语法性别视为抽象类别，但还需要进一步研究来探索其具体实现细节。为了与人类行为进行对比，我们开展了类似的一次性新名词性别学习实验，结果发现，以法语为母语的人与语言模型一样，也表现出阳性性别偏好，并且同样不是优秀的一次性学习者。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日