Closed-source agents suffer from several issues such as a lack of affordability, transparency, and reproducibility, particularly on complex interactive tasks. This motivates the development of open-source alternatives. We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents. LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation, and a grounding module trained to translate these into actions using various tools in the execution module. The design allows for modular upgrades and wider applicability to diverse interactive tasks. To foster generalizable agent learning, we collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales across various complex interactive tasks. On 9 datasets, LUMOS exhibits several key advantages: (1) LUMOS excels multiple larger open-source agents on the held-out datasets (unused for training) for each task type. LUMOS even surpasses GPT agents on QA and web tasks; (2) LUMOS outperforms open-source agents produced by chain-of-thoughts and unmodularized integrated training; and (3) LUMOS effectively generalizes to unseen tasks, outperforming 33B-scale agents and domain-specific agents.
翻译:闭源代理在复杂交互任务中面临成本高昂、透明度低、可重复性差等问题,这促使研究人员开发开源替代方案。我们提出LUMOS——首个面向开源大语言模型代理的训练框架之一。LUMOS采用可学习、统一且模块化的架构,包含用于学习高层子目标生成的规划模块,以及经训练可将子目标转化为具体动作的接地模块(通过执行模块中的多种工具实现)。该设计支持模块化升级,并能广泛适用于各类交互任务。为促进代理的泛化学习能力,我们基于复杂交互任务中多样化的真实推理逻辑,构建了大规模、统一且高质量的训练标注数据。在9个数据集上的实验表明,LUMOS具有以下关键优势:(1)在各类任务的保留数据集(未参与训练)上,LUMOS表现优于多个更大型的开源代理,甚至在问答任务与网页任务上超越GPT代理;(2)LUMOS优于通过思维链与非模块化集成训练生成的开放源码代理;(3)LUMOS能有效泛化至未见任务,性能超过330亿参数规模的代理及领域专用代理。