The performance of large language models (LLMs) has recently improved to the point where the models can generate valid and coherent meta-linguistic analyses of data. This paper illustrates a vast potential for analyses of the meta-linguistic abilities of large language models. LLMs are primarily trained on language data in the form of text; analyzing their meta-linguistic abilities is informative both for our understanding of the general capabilities of LLMs as well as for models of linguistics. In this paper, we propose several types of experiments and prompt designs that allow us to analyze the ability of GPT-4 to generate meta-linguistic analyses. We focus on three linguistics subfields with formalisms that allow for a detailed analysis of GPT-4's theoretical capabilities: theoretical syntax, phonology, and semantics. We identify types of experiments, provide general guidelines, discuss limitations, and offer future directions for this research program.
翻译:大语言模型(LLMs)的性能近期已提升至能够生成有效且连贯的语言数据元语言学分析的程度。本文展示了分析大语言模型元语言能力的巨大潜力。LLMs主要基于文本形式的语言数据进行训练;分析其元语言能力不仅有助于我们理解LLMs的通用能力,也对语言学模型的研究具有启发意义。本文提出了多种实验类型和提示设计方法,用以分析GPT-4生成元语言学分析的能力。我们聚焦于三个语言学分支领域,这些领域的形式化方法能够对GPT-4的理论能力进行细致分析:理论句法学、音系学和语义学。我们界定了实验类型、提供了通用指南、讨论了局限性,并为此研究项目指明了未来方向。