Ambiguity is an critical component of language that allows for more effective communication between speakers, but is often ignored in NLP. Recent work suggests that NLP systems may struggle to grasp certain elements of human language understanding because they may not handle ambiguities at the level that humans naturally do in communication. Additionally, different types of ambiguity may serve different purposes and require different approaches for resolution, and we aim to investigate how language models' abilities vary across types. We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis. Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.
翻译:歧义是语言中至关重要的组成部分,它能促进说话者之间更有效的交流,但在自然语言处理中常被忽视。近期研究表明,自然语言处理系统可能难以把握人类语言理解的某些要素,因为其无法像人类在交流时自然处理歧义那样应对歧义现象。此外,不同类型的歧义可能具有不同功能并需要差异化解歧方法,我们旨在探究语言模型在不同歧义类型上的能力差异。我们提出了一套英语歧义类型分类法以促进自然语言处理分析。本分类法能对语言歧义数据进行有意义的分割,从而实现对数据集和模型性能的更精细化评估。