Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task, and we provide valuable insights into the reasons for this performance gap.
翻译:基于Transformer的大语言模型受限于其底层Transformer架构的固定上下文窗口,阻碍了其生成连贯长文本的能力。内存增强型大语言模型是一种有前景的解决方案,但现有方法无法处理长输出生成任务,原因在于(1)仅专注于读取内存并将其演化简化为新内存的拼接,或(2)使用过于专业化的内存而无法适应其他领域。本文提出L2MAC——首个实用的基于大语言模型的存储程序自动计算机(冯·诺依曼架构)框架,这是一种基于大语言模型的多智能体系统,用于生成长且一致的输出。其内存包含两个组件:指令寄存器(通过提示程序填充以解决用户给定任务)和文件存储(包含最终及中间输出)。每条指令由独立的LLM智能体依次执行,其上下文由具备精确内存读写能力的控制单元管理,以确保与文件存储的有效交互。这些组件使L2MAC能够生成大规模输出,突破有限上下文窗口的约束,同时生成满足复杂用户指定任务的输出。我们通过实验证明,L2MAC在系统设计任务的大规模代码生成中达到了最先进性能,在实现用户详细指定任务方面显著优于其他编码方法,并提供了关于这一性能差距原因的重要见解。