Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i.e. phenomena at the intersection of syntax and semantics. We present the semantic notion of agentivity as a case study for probing such interactions. We created a novel evaluation dataset by utilitizing the unique linguistic properties of a subset of optionally transitive English verbs. This dataset was used to prompt varying sizes of three model classes to see if they are sensitive to agentivity at the lexical level, and if they can appropriately employ these word-level priors given a specific syntactic context. Overall, GPT-3 text-davinci-003 performs extremely well across all experiments, outperforming all other models tested by far. In fact, the results are even better correlated with human judgements than both syntactic and semantic corpus statistics. This suggests that LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery than select corpora for certain tasks.
翻译:大型语言模型的最新进展促使研究者检验它们在多种语言任务上的能力,但关于模型如何处理词汇与较大句法形式之间意义的交互(即句法与语义交叉现象)的研究尚显不足。我们提出以语义概念“施事性”作为探究此类交互的案例研究。通过利用一组可选及物英语动词的独特语言学属性,我们创建了一个新颖的评估数据集。该数据集被用于提示三种模型类别的不同规模版本,以观察它们是否在词汇层面敏感于施事性,并能否在特定句法背景下恰当运用这些词汇层面的先验知识。总体而言,GPT-3 text-davinci-003在所有实验中表现极为出色,远超其他所有测试模型。事实上,其结果与人类判断的相关性甚至优于句法和语义语料库的统计数据。这表明,对于特定任务,语言模型可能比选定的语料库更有潜力成为语言学标注、理论检验和发现的实用工具。