Causal reasoning is the primary bottleneck that Large Language Models (LLMs) must overcome to attain human-level intelligence. To address this, we introduce the Causal Chain of Prompting (C2P) as the first reasoning framework that equips current LLMs with causal reasoning capabilities. C2P operates autonomously, avoiding reliance on external tools or modules during both the causal learning and reasoning phases, and can be seamlessly implemented during the training or fine-tuning of LLMs. Experimental results across various benchmark datasets demonstrate a significant improvement in causal learning and subsequent reasoning accuracy of LLMs. We illustrate how C2P enhances LLMs' ability to causally reason in real-world scenarios, addressing complex problems in fields such as healthcare, medicine, economics, education, social sciences, environmental science, and marketing. With few-shot learning, GPT-4 Turbo using C2P with as few as six examples achieves significant performance improvements, boasting over a 33% increase in reasoning accuracy over the most state-of-the-art LLMs, which perform nearly randomly in similar circumstances. This demonstrates the transformative potential of integrating C2P into LLM training or fine-tuning processes, thereby empowering these models with advanced causal reasoning capabilities.
翻译:因果推理是大型语言模型(LLMs)实现人类水平智能必须克服的主要瓶颈。为此,我们提出了因果提示链(C2P)作为首个为当前LLMs配备因果推理能力的推理框架。C2P自主运行,在因果学习和推理阶段均避免依赖外部工具或模块,并且可以无缝集成到LLMs的训练或微调过程中。在多个基准数据集上的实验结果表明,LLMs的因果学习及后续推理准确性得到了显著提升。我们阐述了C2P如何增强LLMs在现实世界场景中进行因果推理的能力,以解决医疗健康、医学、经济学、教育学、社会科学、环境科学和市场营销等领域的复杂问题。通过少样本学习,使用C2P的GPT-4 Turbo仅需六个示例即可实现显著的性能提升,其推理准确率相比最先进的LLMs提高了33%以上,而后者在类似情况下的表现近乎随机。这证明了将C2P集成到LLM训练或微调过程中的变革性潜力,从而为这些模型赋予先进的因果推理能力。