The transition to 6G calls for tightly integrated sensing and communication to support mission-critical services such as autonomous driving, embodied AI, and high-precision telemedicine. However, most existing ISAC designs rely on a single sensing modality (often RF), which limits environmental understanding and becomes a bottleneck in complex and dynamic scenes. This motivates a shift from single-modal to multimodal ISAC, where heterogeneous sensors (e.g., radar, LiDAR, and cameras) complement each other to improve robustness and semantic awareness. In this article, we first summarize key challenges for multimodal ISAC, including heterogeneous fusion, communication overhead, and scalable system design. We then highlight three enabling technologies: large AI models, semantic communications, and multi-agent systems, and discuss how their combination can enable task-oriented multimodal perception. Building on these insights, we propose a unified cloud-edge-terminal (CET) framework that hierarchically distributes intelligence and supports three adaptive operation modes: global fusion mode (GFM), cooperative relay mode (CRM), and peer interaction mode (PIM). A case study evaluates the framework across three modes, demonstrating that GFM achieves the highest accuracy, PIM minimizes latency, and CRM strikes an optimal balance between performance and efficiency. Finally, we conclude with open research issues and future directions.
翻译:向6G的演进要求紧密集成的感知与通信能力,以支持自主驾驶、具身人工智能和高精度远程医疗等关键任务服务。然而,现有大多数综合感知与通信设计依赖单一感知模态(通常为射频),这限制了环境感知能力,并在复杂动态场景中成为瓶颈。这推动了从单模态向多模态综合感知与通信的转变,其中异构传感器(如雷达、激光雷达和摄像头)相互补充,以增强鲁棒性和语义感知能力。本文首先总结了多模态综合感知与通信的关键挑战,包括异构融合、通信开销和可扩展系统设计。随后,我们着重介绍了三种使能技术:大型人工智能模型、语义通信和多智能体系统,并讨论了它们的结合如何实现面向任务的多模态感知。基于这些见解,我们提出了一种统一的云边端框架,该框架分层分布智能,并支持三种自适应运行模式:全局融合模式、协作中继模式和点对点交互模式。通过案例研究对三种模式进行评估,结果表明全局融合模式精度最高,点对点交互模式延迟最低,而协作中继模式在性能与效率之间实现了最佳平衡。最后,我们总结了开放研究问题与未来方向。