Large language models (LLMs) have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. Chameleon synthesizes programs by composing various tools (e.g., LLMs, off-the-shelf vision models, web search engines, Python functions, and heuristic-based modules) for accomplishing complex reasoning tasks. At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response. We showcase the effectiveness of Chameleon on two multi-modal knowledge-intensive reasoning tasks: ScienceQA and TabMWP. Chameleon, powered by GPT-4, achieves an 86.54% overall accuracy on ScienceQA, improving the best published few-shot result by 11.37%. On TabMWP, GPT-4-powered Chameleon improves the accuracy by 17.0%, lifting the state of the art to 98.78%. Our analysis also shows that the GPT-4-powered planner exhibits more consistent and rational tool selection via inferring potential constraints from instructions, compared to a ChatGPT-powered planner. The project is available at https://chameleon-llm.github.io.
翻译:大型语言模型(LLMs)因涌现的推理能力在解决各类自然语言处理任务方面取得了显著进展。然而,LLMs存在固有局限性,无法获取最新信息(存储在网络上或特定任务知识库中)、无法调用外部工具、也无法执行精确的数学和逻辑推理。本文提出Chameleon——一种通过为LLMs配备即插即用模块实现组合推理以缓解上述局限的AI系统。Chameleon通过组合多种工具(如LLMs、现成的视觉模型、网络搜索引擎、Python函数及基于启发式的模块)来合成程序,以完成复杂推理任务。其核心是基于LLM的规划器,该规划器编排一系列待执行工具以生成最终响应。我们在两个多模态知识密集型推理任务上验证了Chameleon的有效性:ScienceQA和TabMWP。由GPT-4驱动的Chameleon在ScienceQA上达到86.54%的整体准确率,较此前最佳少样本结果提升11.37%。在TabMWP上,由GPT-4驱动的Chameleon将准确率提升17.0%,将当前最优结果推高至98.78%。分析还表明,相较于由ChatGPT驱动的规划器,由GPT-4驱动的规划器能通过从指令中推断潜在约束,展现出更一致且更合理的工具选择。项目地址:https://chameleon-llm.github.io。