What makes a presupposition of an utterance -- information taken for granted by its speaker -- different from other pragmatic inferences such as an entailment is projectivity (e.g., the negative sentence the boy did not stop shedding tears presupposes the boy had shed tears before). The projectivity may vary depending on the combination of presupposition triggers and environments. However, prior natural language understanding studies fail to take it into account as they either use no human baseline or include only negation as an entailment-canceling environment to evaluate models' performance. The current study attempts to reconcile these issues. We introduce a new dataset, projectivity of presupposition (PROPRES, which includes 12k premise-hypothesis pairs crossing six triggers involving some lexical variety with five environments. Our human evaluation reveals that humans exhibit variable projectivity in some cases. However, the model evaluation shows that the best-performed model, DeBERTa, does not fully capture it. Our findings suggest that probing studies on pragmatic inferences should take extra care of the human judgment variability and the combination of linguistic items.
翻译:话语的预设——即说话者视为理所当然的信息——与其他语用推理(如蕴涵)的区别在于投射性(例如,否定句“男孩没有停止流泪”预设男孩之前流过泪)。投射性可能因预设触发词和语境的组合而异。然而,先前的自然语言理解研究未能考虑这一点,因为它们要么未使用人类基线,要么仅将否定作为取消蕴涵的语境来评估模型性能。本研究试图解决这些问题。我们引入了一个新数据集——预设投射性(PROPRES),该数据集包含12,000个前提-假设对,涵盖六种触发词(涉及词汇变体)与五种语境的交叉组合。人类评估显示,人类在某些情况下表现出可变的投射性。但模型评估表明,性能最佳的模型DeBERTa并未完全捕捉到这一特性。我们的研究结果表明,关于语用推理的探针研究应格外关注人类判断的变异性以及语言项目的组合。