In this work, we revisit linguistic acceptability in the context of large language models. We introduce CoLAC - Corpus of Linguistic Acceptability in Chinese, the first large-scale acceptability dataset for a non-Indo-European language. It is verified by native speakers and is the first acceptability dataset that comes with two sets of labels: a linguist label and a crowd label. Our experiments show that even the largest InstructGPT model performs only at chance level on CoLAC, while ChatGPT's performance (48.30 MCC) is also much below supervised models (59.03 MCC) and human (65.11 MCC). Through cross-lingual transfer experiments and fine-grained linguistic analysis, we provide detailed analysis of the model predictions and demonstrate for the first time that knowledge of linguistic acceptability can be transferred across typologically distinct languages, as well as be traced back to pre-training. Our dataset is publicly available at \url{https://github.com/huhailinguist/CoLAC}.
翻译:在本文中,我们重新审视大语言模型背景下的语言可接受性。我们引入了CoLAC——中文语言可接受性语料库,这是首个针对非印欧语系语言的大规模可接受性数据集。该数据集经母语者验证,并且是首个附带两组标注的可接受性数据集:语言学专家标注和众包标注。我们的实验表明,即使最大的InstructGPT模型在CoLAC上的表现也仅达到随机水平,而ChatGPT的表现(48.30 MCC)也远低于监督模型(59.03 MCC)和人类表现(65.11 MCC)。通过跨语言迁移实验和细粒度语言学分析,我们对模型预测进行了详细分析,并首次证明语言可接受性知识可以在类型学上不同的语言之间迁移,并且可以追溯到预训练阶段。我们的数据集已在\url{https://github.com/huhailinguist/CoLAC}公开提供。