This paper introduces a novel annotation framework for the fine-grained modeling of Noun Phrases' (NPs) genericity in natural language. The framework is designed to be simple and intuitive, making it accessible to non-expert annotators and suitable for crowd-sourced tasks. Drawing from theoretical and cognitive literature on genericity, this framework is grounded in established linguistic theory. Through a pilot study, we created a small but crucial annotated dataset of 324 sentences, serving as a foundation for future research. To validate our approach, we conducted an evaluation comparing our continuous annotations with existing binary annotations on the same dataset, demonstrating the framework's effectiveness in capturing nuanced aspects of genericity. Our work offers a practical resource for linguists, providing a first annotated dataset and an annotation scheme designed to build real-language datasets that can be used in studies on the semantics of genericity, and NLP practitioners, contributing to the development of commonsense knowledge repositories valuable in enhancing various NLP applications.
翻译:本文提出了一种新的标注框架,用于对自然语言中名词短语(NP)的泛指性进行细粒度建模。该框架设计简洁直观,易于非专家标注者使用,适用于众包任务。框架借鉴了关于泛指性的理论与认知文献,建立在成熟的语言学理论基础上。通过一项试点研究,我们创建了一个虽小但关键的数据集,包含324个句子,作为未来研究的基础。为验证方法的有效性,我们在同一数据集上进行了评估,将连续标注与现有二元标注进行对比,结果表明该框架能有效捕捉泛指性的细微层面。我们的工作为语言学家提供了实用资源,包括首个标注数据集及标注方案,用于构建可用于泛指性语义研究的真实语言数据集;同时,也为自然语言处理从业者做出贡献,助力开发常识知识库,从而增强各类自然语言处理应用。