Years have passed since the NLP community has last focused on linguistic acceptability. In this work, we revisit this topic in the context of large language models. We introduce CoLAC - Corpus of Linguistic Acceptability in Chinese, the first large-scale non-English acceptability dataset that is verified by native speakers and comes with two sets of labels. Our experiments show that even the largest InstructGPT model performs only at chance level on CoLAC, while ChatGPT's performance (48.30 MCC) is also way below supervised models (59.03 MCC) and human (65.11 MCC). Through cross-lingual transfer experiments and fine-grained linguistic analysis, we demonstrate for the first time that knowledge of linguistic acceptability can be transferred across typologically distinct languages, as well as be traced back to pre-training.
翻译:自自然语言处理领域上一次聚焦于语言可接受性以来,已过去多年。本研究在大语言模型的背景下重新探讨这一议题。我们提出了CoLAC——中文语言可接受性语料库(Corpus of Linguistic Acceptability in Chinese),这是首个由母语者验证并附带两套标签的大规模非英语可接受性数据集。实验表明,即使最大的InstructGPT模型在CoLAC上的表现也仅达到随机水平,而ChatGPT的性能(48.30 MCC)也远低于监督模型(59.03 MCC)和人类表现(65.11 MCC)。通过跨语言迁移实验与细粒度语言学分析,我们首次证明语言可接受性知识可以在类型学上截然不同的语言之间进行迁移,并可追溯至预训练阶段。