Recent research in Large Language Models (LLMs) has shown promising progress related to LLM alignment with human preferences. LLM-empowered decision-making systems are expected to be predictable, reliable and trustworthy, which implies being free from paradoxes or contradictions that could undermine their credibility and validity. However, LLMs still exhibit inconsistent and biased behaviour when making decisions or judgements. In this work, we focus on studying logical consistency of LLMs as a prerequisite for more reliable and trustworthy systems. Logical consistency ensures that decisions are based on a stable and coherent understanding of the problem, reducing the risk of erratic or contradictory outputs. We first propose a universal framework to quantify the logical consistency via three fundamental proxies: transitivity, commutativity and negation invariance. We then evaluate logical consistency, using the defined measures, of a wide range of LLMs, demonstrating that it can serve as a strong proxy for overall robustness. Additionally, we introduce a data refinement and augmentation technique that enhances the logical consistency of LLMs without sacrificing alignment to human preferences. It augments noisy and sparse pairwise-comparison annotations by estimating a partially or totally ordered preference rankings using rank aggregation methods. Finally, we show that logical consistency impacts the performance of LLM-based logic-dependent algorithms, where LLMs serve as logical operators.
翻译:大型语言模型(LLMs)的最新研究在模型与人类偏好对齐方面展现出积极进展。基于LLM的决策系统需具备可预测性、可靠性与可信度,这意味着系统应避免可能损害其可信度与有效性的悖论或矛盾。然而,当前LLMs在决策或判断时仍表现出不一致性与偏差行为。本研究聚焦于探究LLMs的逻辑一致性,将其作为构建更可靠、可信系统的先决条件。逻辑一致性确保决策建立在对问题的稳定且连贯的理解之上,从而降低输出结果随机或矛盾的风险。我们首先提出一个通用框架,通过三个基本代理指标——传递性、交换性与否定不变性——来量化逻辑一致性。随后,我们使用定义的度量方法对多种LLMs的逻辑一致性进行评估,证明该指标可作为整体鲁棒性的强效代理指标。此外,我们提出一种数据精炼与增强技术,可在保持人类偏好对齐的同时提升LLMs的逻辑一致性。该技术通过秩聚合方法估计部分或全序偏好排序,从而增强噪声且稀疏的成对比较标注数据。最后,我们证明逻辑一致性会影响基于LLM的逻辑依赖算法性能,其中LLMs承担逻辑运算符的功能。