Deep generative models offer a powerful alternative to conventional channel estimation by learning complex channel distributions. By integrating the rich environmental information available in modern sensing-aided networks, this paper proposes MultiCE-Flow, a multimodal channel estimation framework based on flow matching and diffusion transformer (DiT). We design a specialized multimodal perception module that fuses LiDAR, camera, and location data into a semantic condition, while treating sparse pilots as a structural condition. These conditions guide a DiT backbone to reconstruct high-fidelity channels. Unlike standard diffusion models, we employ flow matching to learn a linear trajectory from noise to data, enabling efficient one-step sampling. By leveraging environmental semantics, our method mitigates the ill-posed nature of estimation with sparse pilots. Extensive experiments demonstrate that MultiCE-Flow consistently outperforms traditional baselines and existing generative models. Notably, it exhibits superior robustness to out-of-distribution scenarios and varying pilot densities, making it suitable for environment-aware communication systems.
翻译:深度生成模型通过学习复杂信道分布,为传统信道估计提供了强大替代方案。本文通过整合现代感知辅助网络中丰富的环境信息,提出了MultiCE-Flow——一种基于流匹配与扩散Transformer(DiT)的多模态信道估计框架。我们设计了专门的多模态感知模块,将LiDAR、相机和定位数据融合为语义条件,同时将稀疏导频视为结构条件。这些条件引导DiT主干网络重建高保真信道。与标准扩散模型不同,我们采用流匹配来学习从噪声到数据的线性轨迹,实现高效的单步采样。通过利用环境语义信息,我们的方法缓解了稀疏导频下估计问题的不适定性。大量实验表明,MultiCE-Flow在传统基线方法和现有生成模型中均表现出持续优越性。值得注意的是,该方法对分布外场景和不同导频密度展现出卓越的鲁棒性,使其适用于环境感知通信系统。