Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 dataset. Our approach examines sentence pairs labeled to indicate whether the second sentence negates the first. Our findings expose a considerable performance disparity among the GPT models, with GPT-4 surpassing its counterparts and GPT-3.5 displaying a marked performance reduction. The overall proficiency of the GPT models in negation detection remains relatively modest, indicating that this task pushes the boundaries of their natural language understanding capabilities. We not only highlight the constraints of GPT models in handling negation but also emphasize the importance of logical reliability in high-stakes domains such as healthcare, science, and law.
翻译:否定是自然语言的基本属性,在沟通与理解中起着关键作用。本研究评估了生成式预训练Transformer(GPT)模型(具体包括GPT-2、GPT-3、GPT-3.5和GPT-4)的否定检测性能。我们聚焦于利用零样本预测方法,在自建xNot360数据集上识别自然语言中的否定。该方法分析标注句子对,判断第二句是否否定了第一句。研究结果揭示了GPT模型之间的显著性能差异:GPT-4优于其他模型,而GPT-3.5表现出明显的性能下降。GPT模型在否定检测方面的整体能力仍相对有限,表明这一任务挑战了其自然语言理解能力的边界。我们不仅指出了GPT模型在处理否定时的局限性,还强调了在医疗、科学和法律等高敏感领域中逻辑可靠性的重要性。