Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.
翻译:无意义与异常语句在语义解释计算模型的发展中发挥了重要作用。一个核心挑战在于区分何为仅属异常(但可在支持性语境下被解释)与何为真正无意义。然而,现有数据集在多大程度上属于无意义而非仅属异常尚不明确,同时大型语言模型区分二者的能力亦不清楚。本文通过收集人类评分者与大型语言模型对五个语义偏差数据集中语句的可理解性判断(包括无语境与提供语境两种情况)来回答这两个问题。我们发现评分者认为大多数语句至多属于异常,仅少数可被判定为完全无意义。我们还证明大型语言模型在生成异常语句的合理语境方面表现出显著能力。