ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

Although large language models (LLMs) have achieved excellent performance in a variety of evaluation benchmarks, they still struggle in complex reasoning tasks which require specific knowledge and multi-hop reasoning. To improve the reasoning abilities, we propose ChatCoT, a tool-augmented chain-of-thought reasoning framework for chat-based LLMs (e.g., ChatGPT). In ChatCoT, we model the chain-of-thought (CoT) reasoning as multi-turn conversations, to utilize tools in a more natural way through chatting. At each turn, LLMs can either interact with tools or perform the reasoning. Our approach can effectively leverage the multi-turn conversation ability of chat-based LLMs, and integrate the thought chain following and tools manipulation in a unified way. Specially, we initialize the early turns of the conversation by the knowledge about tools, tasks, and reasoning format, and propose an iterative tool-augmented reasoning step to perform step-by-step tool-augmented reasoning. The experiment results on two complex reasoning datasets (MATH and HotpotQA) have shown the effectiveness of ChatCoT on complex reasoning tasks, achieving a 7.9% relative improvement over the state-of-the-art baseline. Our code and data are available at: \url{https://github.com/RUCAIBOX/ChatCoT}.

翻译：尽管大规模语言模型（LLMs）在多种评估基准中取得了优异性能，但在需要特定知识与多跳推理的复杂推理任务中仍存在不足。为提升推理能力，我们提出ChatCoT——一种面向聊天式LLMs（如ChatGPT）的工具增强思维链推理框架。在ChatCoT中，我们将思维链（CoT）推理建模为多轮对话，通过聊天方式更自然地使用工具。每轮对话中，LLMs可与工具交互或执行推理。该方法能有效利用聊天式LLMs的多轮对话能力，将思维链遵循与工具操作统一整合。具体而言，我们通过工具、任务及推理格式的相关知识初始化对话早期轮次，并提出迭代式工具增强推理步骤，实现逐步的工具辅助推理。在两个复杂推理数据集（MATH与HotpotQA）上的实验结果表明，ChatCoT在复杂推理任务中具有有效性，相较于最先进基线实现了7.9%的相对性能提升。我们的代码与数据已开源至：\url{https://github.com/RUCAIBOX/ChatCoT}。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日