The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.
翻译:语言模型能力的持续提升使其作为自主智能体核心驱动力的应用日益广泛,例如在编程或计算机操作等场景中。然而,自ChatGPT等早期基于指令微调的模型以来,这些系统的核心架构并未发生根本性变化。即便是先进的AI智能体仍沿用消息交换模式,在与用户、系统、自身(即思维链)及工具的连续单流计算中依次交换信息。这种聊天模型中的单流瓶颈导致多重局限:智能体无法在读取信息时同时生成输出,也无法在写作时对新信息做出即时反应。类似地,智能体无法在思考时执行操作,也无法在读取或处理信息时同步思考。本研究证明,通过将指令微调从序列化消息格式转向支持多路并行计算流的架构——将每个角色拆分为独立流——即可解除上述模型限制。语言模型的每次前向传播将同时读取多个输入流,并在多个输出流中生成令牌,所有流均遵循因果依赖关系并基于先前时间步的信息。我们论证,这种数据驱动的范式转变不仅能够解决上述可用性缺陷,还能通过并行化提升模型效率,通过更优的职责分离增强模型安全性,并进一步改善模型的可监控性。