Large language models (LLMs) are a promising venue for natural language understanding and generation tasks. However, current LLMs are far from reliable: they are prone to generate non-factual information and, more crucially, to contradict themselves when prompted to reason about beliefs of the world. These problems are currently addressed with large scale fine-tuning or by delegating consistent reasoning to external tools. In this work, we strive for a middle ground and introduce a training objective based on principled probabilistic reasoning that teaches a LLM to be consistent with external knowledge in the form of a set of facts and rules. Fine-tuning with our loss on a limited set of facts enables our LLMs to be more logically consistent than previous baselines and allows them to extrapolate to unseen but semantically similar factual knowledge more systematically.
翻译:大语言模型(LLMs)在自然语言理解与生成任务中展现出广阔前景。然而,当前LLMs远非可靠:它们易于生成非事实信息,更关键的是,在提示其推理世界信念时容易自相矛盾。目前这些问题通过大规模微调或委托外部工具进行一致推理来解决。本研究致力于寻求折衷方案,引入基于原则性概率推理的训练目标,使LLM能够与以事实与规则集形式存在的外部知识保持一致性。通过有限事实集上应用我们的损失函数进行微调,可使LLM比先前基线模型更具逻辑一致性,并使其能够更系统性地推广至未见但语义相似的事实知识。