We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
翻译:我们提出TapeAgents,这是一个围绕智能体会话的细粒度结构化日志磁带构建的智能体框架,该磁带同时承担会话可恢复状态的角色。在TapeAgents中,我们利用磁带促进大语言模型智能体开发生命周期的所有阶段。智能体通过处理磁带和大语言模型输出来进行推理,生成新的思维与行动步骤并将其追加到磁带中。随后环境通过同样向磁带追加观察步骤来响应智能体的行动。凭借这种以磁带为中心的设计,TapeAgents能够为人工智能从业者提供端到端的整体支持。在开发阶段,磁带支持会话持久化、智能体审计和逐步调试。部署后,可复用磁带进行评估、微调和提示调优;关键的是,可以适配来自其他智能体的磁带或使用修订后的历史磁带。在本报告中,我们详细阐述TapeAgents的设计。我们通过构建单体智能体与多智能体团队、优化智能体提示词以及微调智能体大语言模型等多个具体案例,展示TapeAgents的可能应用。我们展示了工具原型,并报告了一项案例研究:使用TapeAgents微调一个Llama-3.1-8B填表助手,使其性能与GPT-4o相当,同时成本降低数个数量级。最后,我们的对比分析表明,TapeAgents相较于现有框架的优势源于我们新颖的设计:将大语言模型智能体构建为一个具有结构化配置、可恢复的模块化状态机,该状态机生成细粒度的结构化日志,并能将这些日志转化为训练文本——这是先前工作中所缺乏的独特功能组合。