The performance of large language models (LLMs) has recently improved to the point where the models can perform well on many language tasks. We show here that for the first time, the models can also generate coherent and valid formal analyses of linguistic data and illustrate the vast potential of large language models for analyses of their metalinguistic abilities. LLMs are primarily trained on language data in the form of text; analyzing and evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. In this paper, we probe into GPT-4's metalinguistic capabilities by focusing on three subfields of formal linguistics: syntax, phonology, and semantics. We outline a research program for metalinguistic analyses of large language models, propose experimental designs, provide general guidelines, discuss limitations, and offer future directions for this line of research. This line of inquiry also exemplifies behavioral interpretability of deep learning, where models' representations are accessed by explicit prompting rather than internal representations.
翻译:近年来,大型语言模型的性能已显著提升,使其能够在众多语言任务中表现出色。我们首次证明,这类模型还能生成连贯且有效的形式化语言学数据分析,这揭示了大型语言模型在元语言能力分析方面的巨大潜力。LLM主要以文本形式的语言数据训练;对其元语言能力进行分析与评估,不仅能加深我们对它们通用能力的理解,也为语言学理论模型提供了新的启示。在本文中,我们聚焦于形式语言学的三个子领域——句法学、音系学和语义学,深入探究GPT-4的元语言能力。我们勾勒出大型语言模型元语言分析的研究纲领,提出实验设计、提供通用指南、讨论局限性,并展望这一研究方向的发展前景。这一研究路径也体现了深度学习的可解释性行为分析范式——即通过显式提示而非内部表征来获取模型的表征信息。