We introduce LLM-ARC, a neuro-symbolic framework designed to enhance the logical reasoning capabilities of Large Language Models (LLMs), by combining them with an Automated Reasoning Critic (ARC). LLM-ARC employs an Actor-Critic method where the LLM Actor generates declarative logic programs along with tests for semantic correctness, while the Automated Reasoning Critic evaluates the code, runs the tests and provides feedback on test failures for iterative refinement. Implemented using Answer Set Programming (ASP), LLM-ARC achieves a new state-of-the-art accuracy of 88.32% on the FOLIO benchmark which tests complex logical reasoning capabilities. Our experiments demonstrate significant improvements over LLM-only baselines, highlighting the importance of logic test generation and iterative self-refinement. We achieve our best result using a fully automated self-supervised training loop where the Actor is trained on end-to-end dialog traces with Critic feedback. We discuss potential enhancements and provide a detailed error analysis, showcasing the robustness and efficacy of LLM-ARC for complex natural language reasoning tasks.
翻译:本文介绍LLM-ARC,一种神经符号框架,旨在通过将大语言模型与自动推理评判器相结合,提升大语言模型的逻辑推理能力。LLM-ARC采用执行者-评判者方法:LLM执行者生成声明式逻辑程序及其语义正确性测试,而自动推理评判器则评估代码、运行测试,并根据测试失败情况提供反馈以进行迭代优化。基于答案集编程实现的LLM-ARC在测试复杂逻辑推理能力的FOLIO基准上达到了88.32%的最新准确率。实验表明,该方法较纯大语言模型基线有显著提升,凸显了逻辑测试生成与迭代自我优化的重要性。我们通过完全自动化的自监督训练循环(执行者基于包含评判者反馈的端到端对话轨迹进行训练)获得了最佳结果。文中探讨了可能的改进方向,并提供了详细的错误分析,展示了LLM-ARC在复杂自然语言推理任务中的鲁棒性与有效性。