We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex interactive tasks.
翻译:我们提出SwiftSage——一种受人类认知双过程理论启发的全新智能体框架,专为复杂交互推理任务中的动作规划而设计。该框架融合了行为克隆与大型语言模型提示技术的优势,以提升任务完成性能。SwiftSage包含两大核心模块:代表快速直观思维的Swift模块,以及模拟审慎思考过程的Sage模块。其中Swift模块是基于专家智能体动作轨迹微调的小型编码器-解码器语言模型,而Sage模块则采用GPT-4等大型语言模型进行子目标规划与落地。我们开发了一种启发式方法,将这两个模块和谐地整合为一个更高效、更稳健的问题求解系统。在ScienceWorld基准测试的30个任务中,SwiftSage显著优于SayCan、ReAct和Reflexion等方法,充分证明了其在解决复杂交互任务方面的有效性。