Generalized quantifiers (e.g., few, most) are used to indicate the proportions predicates are satisfied (for example, some apples are red). One way to interpret quantifier semantics is to explicitly bind these satisfactions with percentage scopes (e.g., 30%-40% of apples are red). This approach can be helpful for tasks like logic formalization and surface-form quantitative reasoning (Gordon and Schubert, 2010; Roy et al., 2015). However, it remains unclear if recent foundation models possess this ability, as they lack direct training signals. To explore this, we introduce QuRe, a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences featuring percentage-equipped predicates. We explore quantifier comprehension in language models using PRESQUE, a framework that combines natural language inference and the Rational Speech Acts framework. Experimental results on the HVD dataset and QuRe illustrate that PRESQUE, employing pragmatic reasoning, performs 20% better than a literal reasoning baseline when predicting quantifier percentage scopes, with no additional training required.
翻译:广义量词(如“少数”“大多数”)用于表示谓词被满足的比例(例如,一些苹果是红色的)。解释量词语义的一种方法是明确地将这些满足程度与百分比范围绑定(例如,30%-40%的苹果是红色的)。这种方法对于逻辑形式化和表层定量推理等任务具有实用价值(Gordon and Schubert, 2010; Roy et al., 2015)。然而,当前基础模型是否具备这种能力尚不明确,因为它们缺乏直接的训练信号。为探索这一问题,我们提出了QuRe数据集——一个从维基百科句子中众包标注的、包含百分比谓词的人类标注广义量词语料库。我们采用PRESQUE框架(结合自然语言推理与理性言语行为框架)探究语言模型的量词理解能力。在HVD数据集和QuRe上的实验结果表明,采用实用推理的PRESQUE在预测量词百分比范围时,比字面推理基准方法性能提升20%,且无需额外训练。