Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.
翻译:尽管大语言模型(LLMs)已明显具备一定的语法知识和泛化能力,但在解释否定这一自然语言处理的关键步骤上仍存在失败。我们试图阐明LLMs在否定理解方面表现欠佳的原因。我们引入了一个约40万条描述性句子的大型半自动生成数据集,这些句子涉及常识性知识,可为真或假,其中约三分之二的语料以不同形式包含否定。我们利用该数据集对现有最大的开源LLMs进行了零样本测试,以评估其泛化和推理能力,并对部分模型进行了微调,以探究否定理解是否可被训练。研究结果表明,尽管LLMs在分类肯定句方面表现熟练,但它们在处理否定句时存在困难,且缺乏对否定的深入理解,常常依赖表面线索。虽然对否定句进行模型微调可提升其性能,但处理否定时的泛化能力持续不足,凸显了LLMs在否定理解和泛化方面仍面临的挑战。该数据集和代码均已公开。