Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs

Lexical-syntactic flexibility, in the form of conversion (or zero-derivation) is a hallmark of English morphology. In conversion, a word with one part of speech is placed in a non-prototypical context, where it is coerced to behave as if it had a different part of speech. However, while this process affects a large part of the English lexicon, little work has been done to establish the degree to which language models capture this type of generalization. This paper reports the first study on the behavior of large language models with reference to conversion. We design a task for testing lexical-syntactic flexibility -- the degree to which models can generalize over words in a construction with a non-prototypical part of speech. This task is situated within a natural language inference paradigm. We test the abilities of five language models -- two proprietary models (GPT-3.5 and GPT-4), three open-source models (Mistral 7B, Falcon 40B, and Llama 2 70B). We find that GPT-4 performs best on the task, followed by GPT-3.5, but that the open source language models are also able to perform it and that the 7B parameter Mistral displays as little difference between its baseline performance on the natural language inference task and the non-prototypical syntactic category task, as the massive GPT-4.

翻译：词汇-句法灵活性，以转换（或零派生）形式呈现，是英语形态学的一个显著特征。在转换过程中，某个具有特定词性的词汇被置于非典型语境中，被迫表现出不同词性的行为特征。然而，尽管这一过程影响英语词汇体系的很大一部分，但关于语言模型如何捕捉此类泛化能力的研究仍十分有限。本文首次针对大语言模型在转换现象上的表现展开研究。我们设计了一项测试词汇-句法灵活性的任务——即模型在含有非典型词性的构式中对词汇进行泛化的能力。该任务基于自然语言推理范式构建，并测试了五种语言模型：两种专有模型（GPT-3.5和GPT-4）与三种开源模型（Mistral 7B、Falcon 40B和Llama 2 70B）。研究发现，GPT-4在该任务中表现最佳，GPT-3.5次之，但开源语言模型同样具备完成该任务的能力。值得注意的是，参数量仅为7B的Mistral在自然语言推理基线任务与非典型句法范畴任务之间的表现差异与参数量庞大的GPT-4同样微小。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日