LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

We introduce LLM-ARC, a neuro-symbolic framework designed to enhance the logical reasoning capabilities of Large Language Models (LLMs), by combining them with an Automated Reasoning Critic (ARC). LLM-ARC employs an Actor-Critic method where the LLM Actor generates declarative logic programs along with tests for semantic correctness, while the Automated Reasoning Critic evaluates the code, runs the tests and provides feedback on test failures for iterative refinement. Implemented using Answer Set Programming (ASP), LLM-ARC achieves a new state-of-the-art accuracy of 88.32% on the FOLIO benchmark which tests complex logical reasoning capabilities. Our experiments demonstrate significant improvements over LLM-only baselines, highlighting the importance of logic test generation and iterative self-refinement. We achieve our best result using a fully automated self-supervised training loop where the Actor is trained on end-to-end dialog traces with Critic feedback. We discuss potential enhancements and provide a detailed error analysis, showcasing the robustness and efficacy of LLM-ARC for complex natural language reasoning tasks.

翻译：我们提出了LLM-ARC，一种旨在增强大语言模型逻辑推理能力的神经符号框架，其核心是将大语言模型与一个自动推理评判器相结合。LLM-ARC采用Actor-Critic方法，其中LLM Actor生成声明式逻辑程序及其语义正确性测试，而自动推理评判器则负责评估代码、运行测试，并根据测试失败情况提供反馈以进行迭代优化。该框架使用答案集编程实现，在测试复杂逻辑推理能力的FOLIO基准上达到了88.32%的最新最优准确率。我们的实验表明，该方法相较于纯LLM基线有显著提升，凸显了逻辑测试生成与迭代自我精炼的重要性。我们通过一个完全自动化的自监督训练循环取得了最佳结果，其中Actor在包含Critic反馈的端到端对话轨迹上进行训练。我们讨论了潜在的改进方向，并提供了详细的错误分析，展示了LLM-ARC在处理复杂自然语言推理任务时的鲁棒性与有效性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日