Prompt-based methods may underestimate large language models' linguistic generalizations

Prompting is now a dominant method for evaluating the linguistic knowledge of large language models (LLMs). While other methods directly read out models' probability distributions over strings, prompting requires models to access this internal information by processing linguistic input, thereby implicitly testing a new type of emergent ability: metalinguistic judgment. In this study, we compare metalinguistic prompting and direct probability measurements as ways of measuring models' knowledge of English. Broadly, we find that LLMs' metalinguistic judgments are inferior to quantities directly derived from representations. Furthermore, consistency gets worse as the prompt diverges from direct measurements of next-word probabilities. Our findings suggest that negative results relying on metalinguistic prompts cannot be taken as conclusive evidence that an LLM lacks a particular linguistic competence. Our results also highlight the lost value with the move to closed APIs where access to probability distributions is limited.

翻译：提示（Prompting）现在是评估大语言模型（LLMs）语言知识的主流方法。与直接读取模型在字符串上的概率分布的其他方法不同，提示要求模型通过处理语言输入来访问这些内部信息，从而隐式测试一种新型的涌现能力：元语言判断。在本研究中，我们比较了元语言提示和直接概率测量这两种评估模型英语知识的方法。总体而言，我们发现LLMs的元语言判断能力弱于从表征中直接推导出的量化指标。此外，当提示偏离对下一个词概率的直接测量时，一致性会变得更差。我们的研究结果表明，依赖元语言提示的负面结果不能作为LLMs缺乏特定语言能力的确定性证据。我们的结果也凸显了在转向访问概率分布受限的封闭API时所损失的价值。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日