Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and information quantization loss. While latent state transfer offers a high-bandwidth alternative, existing approaches either assume homogeneous sender-receiver architectures or rely on pair-specific learned translators, limiting scalability and modularity across diverse model families with disjoint manifolds. In this work, we propose the Vision Wormhole, a novel framework that repurposes the visual interface of Vision-Language Models (VLMs) to enable model-agnostic, text-free communication. By introducing a Universal Visual Codec, we map heterogeneous reasoning traces into a shared continuous latent space and inject them directly into the receiver's visual pathway, effectively treating the vision encoder as a universal port for inter-agent telepathy. Our framework adopts a hub-and-spoke topology to reduce pairwise alignment complexity from O(N^2) to O(N) and leverages a label-free, teacher-student distillation objective to align the high-speed visual channel with the robust reasoning patterns of the text pathway. Extensive experiments across heterogeneous model families (e.g., Qwen-VL, Gemma) demonstrate that the Vision Wormhole reduces end-to-end wall-clock time in controlled comparisons while maintaining reasoning fidelity comparable to standard text-based MAS. Code is available at https://github.com/xz-liu/heterogeneous-latent-mas
翻译:基于大型语言模型的多智能体系统已实现高级协同推理,但仍受限于离散文本通信的低效性,这种通信方式会带来显著的运行时开销和信息量化损失。虽然潜状态传输提供了高带宽的替代方案,但现有方法要么假设发送方与接收方架构同质,要么依赖针对特定配对学习的翻译器,这限制了在具有不相交流形的异构模型族间的可扩展性与模块化。本研究提出视觉虫洞,一种创新框架,通过重新利用视觉语言模型的视觉接口实现模型无关的无文本通信。通过引入通用视觉编解码器,我们将异构推理轨迹映射到共享的连续潜空间,并直接注入接收方的视觉通路,从而将视觉编码器有效地用作智能体间心灵感应的通用端口。该框架采用星型拓扑结构,将两两对齐复杂度从O(N²)降低至O(N),并利用无标签的师生蒸馏目标,将高速视觉通道与文本通路的稳健推理模式对齐。在异构模型族(如Qwen-VL、Gemma)上的大量实验表明,在受控对比中,视觉虫洞能减少端到端实际运行时间,同时保持与标准基于文本的多智能体系统相当的推理保真度。代码发布于 https://github.com/xz-liu/heterogeneous-latent-mas