Recent research in Large Language Models (LLMs) has shown promising progress related to LLM alignment with human preferences. LLM-empowered decision-making systems are expected to be predictable, reliable and trustworthy, which implies being free from paradoxes or contradictions that could undermine their credibility and validity. However, LLMs still exhibit inconsistent and biased behaviour when making decisions or judgements. In this work, we focus on studying logical consistency of LLMs as a prerequisite for more reliable and trustworthy systems. Logical consistency ensures that decisions are based on a stable and coherent understanding of the problem, reducing the risk of erratic or contradictory outputs. We first propose a universal framework to quantify the logical consistency via three fundamental proxies: transitivity, commutativity and negation invariance. We then evaluate logical consistency, using the defined measures, of a wide range of LLMs, demonstrating that it can serve as a strong proxy for overall robustness. Additionally, we introduce a data refinement and augmentation technique that enhances the logical consistency of LLMs without sacrificing alignment to human preferences. It augments noisy and sparse pairwise-comparison annotations by estimating a partially or totally ordered preference rankings using rank aggregation methods. Finally, we show that logical consistency impacts the performance of LLM-based logic-dependent algorithms, where LLMs serve as logical operators.
翻译:近期关于大语言模型(LLMs)的研究在使其与人类偏好对齐方面展现出令人期待的进展。基于LLM的决策系统应具备可预测性、可靠性与可信度,这意味着系统需避免可能损害其可信度与有效性的悖论或矛盾。然而,LLMs在做出决策或判断时仍表现出不一致与有偏的行为。本工作中,我们聚焦于研究LLMs的逻辑一致性,将其作为构建更可靠、更可信系统的前提条件。逻辑一致性确保决策建立在对问题稳定且连贯的理解之上,从而降低输出结果随机或矛盾的风险。我们首先提出一个通用框架,通过三个基本代理指标——传递性、交换性与否定不变性——来量化逻辑一致性。随后,我们使用定义的度量方法评估了广泛LLMs的逻辑一致性,证明其可作为整体鲁棒性的强有力代理指标。此外,我们引入一种数据精炼与增强技术,该技术能在不牺牲与人类偏好对齐的前提下提升LLMs的逻辑一致性。该技术通过使用排序聚合方法估计部分或全序的偏好排名,从而增强噪声且稀疏的成对比较标注。最后,我们展示了逻辑一致性对基于LLM的逻辑依赖算法性能的影响,其中LLMs充当逻辑运算符的角色。