Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.
翻译:无意义句与异常句在语义解释计算模型的发展中一直发挥着重要作用。一个核心挑战在于区分何为仅属异常(但可在支持性语境下获得解释)与何为真正无意义。然而,现有数据集在多大程度上属于无意义而非仅属异常,以及大型语言模型(LLMs)区分二者的能力如何,目前尚不明确。本文通过收集人类评分者与LLMs对五个语义偏离数据集中句子的可理解性判断(包括无语境与提供语境两种情况)来回答这两个问题。我们发现,评分者认为大多数句子至多属于异常,仅少数可被判定为完全无意义。我们还证明,LLMs在生成异常句的合理语境方面表现出显著能力。