Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

翻译：语言模型能力的持续提升使其作为自主智能体核心驱动力的应用日益广泛，例如在编程或计算机操作等场景中。然而，自ChatGPT等早期基于指令微调的模型以来，这些系统的核心架构并未发生根本性变化。即便是先进的AI智能体仍沿用消息交换模式，在与用户、系统、自身（即思维链）及工具的连续单流计算中依次交换信息。这种聊天模型中的单流瓶颈导致多重局限：智能体无法在读取信息时同时生成输出，也无法在写作时对新信息做出即时反应。类似地，智能体无法在思考时执行操作，也无法在读取或处理信息时同步思考。本研究证明，通过将指令微调从序列化消息格式转向支持多路并行计算流的架构——将每个角色拆分为独立流——即可解除上述模型限制。语言模型的每次前向传播将同时读取多个输入流，并在多个输出流中生成令牌，所有流均遵循因果依赖关系并基于先前时间步的信息。我们论证，这种数据驱动的范式转变不仅能够解决上述可用性缺陷，还能通过并行化提升模型效率，通过更优的职责分离增强模型安全性，并进一步改善模型的可监控性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

23+阅读 · 3月30日

智能体化多模态大语言模型综述

专知会员服务

39+阅读 · 2025年10月14日

LLMS4ALL：大语言模型在各学科科研与应用中的综述

专知会员服务

36+阅读 · 2025年10月4日