Multimodal signals, including text, audio, image and video, can be integrated into Semantic Communication (SC) for providing an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal fading. Recent advancements in large AI models, particularly in Multimodal Language Model (MLM) and Large Language Model (LLM), offer potential solutions for these issues. To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, in which we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery through the LLM. This effectively addresses the semantic ambiguity. Finally, we apply the Conditional Generative adversarial networks-based channel Estimation (CGE) to obtain Channel State Information (CSI). This approach effectively mitigates the impact of fading channels in SC. Finally, we conduct simulations that demonstrate the superior performance of the LAM-MSC framework.
翻译:多模态信号,包括文本、音频、图像和视频,可被集成到语义通信中,从而在语义层面提供低延迟、高质量的沉浸式体验。然而,多模态语义通信面临数据异构性、语义模糊性和信号衰落等若干挑战。大型AI模型的最新进展,特别是多模态语言模型和大语言模型,为这些问题提供了潜在的解决方案。为此,本文提出一种基于大型AI模型的多模态语义通信框架,其中我们首先提出基于多模态语言模型的多模态对齐方法,利用多模态语言模型实现多模态与单模态数据间的转换,同时保持语义一致性。随后,提出一种个性化基于大语言模型的知识库,使用户能够通过大语言模型执行个性化语义提取或恢复,有效解决了语义模糊性问题。最后,我们应用基于条件生成对抗网络的信道估计方法获取信道状态信息,该方法有效减轻了衰落信道对语义通信的影响。仿真结果表明,所提出的多模态语义通信框架具有优越性能。