Emergent communication offers insight into how agents develop shared structured representations, yet most research assumes homogeneous modalities or aligned representational spaces, overlooking the perceptual heterogeneity of real-world settings. We study a heterogeneous multi-step binary communication game where agents differ in modality and lack perceptual grounding. Despite perceptual misalignment, multimodal systems converge to class-consistent messages grounded in perceptual input. Unimodal systems communicate more efficiently, using fewer bits and achieving lower classification entropy, while multimodal agents require greater information exchange and exhibit higher uncertainty. Bit perturbation experiments provide strong evidence that meaning is encoded in a distributional rather than compositional manner, as each bit's contribution depends on its surrounding pattern. Finally, interoperability analyses show that systems trained in different perceptual worlds fail to directly communicate, but limited fine-tuning enables successful cross-system communication. This work positions emergent communication as a framework for studying how agents adapt and transfer representations across heterogeneous modalities, opening new directions for both theory and experimentation.
翻译:涌现通信为研究智能体如何形成共享结构化表征提供了洞见,然而现有研究大多假设同质模态或对齐的表征空间,忽视了真实场景中的感知异质性。本文研究一种异构多步二元通信博弈,其中智能体在模态上存在差异且缺乏感知基础。尽管存在感知错位,多模态系统仍能收敛于基于感知输入的类别一致性消息。单模态系统的通信效率更高,使用更少的比特数且达到更低的分类熵,而多模态智能体则需要更多的信息交换并表现出更高的不确定性。比特扰动实验提供了有力证据,表明意义是以分布方式而非组合方式编码的,因为每个比特的贡献取决于其周围的模式。最后,互操作性分析表明,在不同感知世界中训练的系统无法直接通信,但通过有限微调可实现成功的跨系统通信。本工作将涌现通信定位为一个研究智能体如何适应和迁移异构模态间表征的框架,为理论和实验研究开辟了新方向。