Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases

Theoretical linguists have suggested that some languages (e.g., Chinese and Japanese) are "cooler" than other languages based on the observation that the intended meaning of phrases in these languages depends more on their contexts. As a result, many expressions in these languages are shortened, and their meaning is inferred from the context. In this paper, we focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs) to investigate the predictability of their intended meaning given the contexts. To this end, we built a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness. We carried out corpus assessments and analyses. The results suggest that Chinese speakers indeed drop plurality and definiteness markers very frequently. Building on the corpus, we train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP. We report on the performance of these models and analyse their behaviours.

翻译：理论语言学家指出，基于某些语言（如中文和日语）中短语的预期含义更依赖语境这一观察，这些语言比其他语言"更冷"。因此，这些语言中的许多表达被简化，其含义通过语境推断得出。本文聚焦中文名词短语中复数与定指性标记的省略现象，探究其预期含义在给定语境中的可预测性。为此，我们构建了一个中文名词短语语料库，每个短语附有对应语境，并标注其单复数与定指/非定指属性。通过语料评估与分析，结果表明中文使用者确实非常频繁地省略复数与定指性标记。基于该语料库，我们训练了一系列计算模型，采用经典机器学习模型与最先进的预训练语言模型来预测每个名词短语的复数与定指属性。我们报告了这些模型的性能表现并分析了其行为特征。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日