We present LatentCSI, a novel method for generating images of the physical environment from WiFi CSI measurements that leverages a pretrained latent diffusion model (LDM). Unlike prior approaches that rely on complex and computationally intensive techniques such as GANs, our method employs a lightweight neural network to map CSI amplitudes directly into the latent space of an LDM. We then apply the LDM's denoising diffusion model to the latent representation with text-based guidance before decoding using the LDM's pretrained decoder to obtain a high-resolution image. This design bypasses the challenges of pixel-space image generation and avoids the explicit image encoding stage typically required in conventional image-to-image pipelines, enabling efficient and high-quality image synthesis. We validate our approach on two datasets: a wide-band CSI dataset we collected with off-the-shelf WiFi devices and cameras; and a subset of the publicly available MM-Fi dataset. The results demonstrate that LatentCSI outperforms baselines of comparable complexity trained directly on ground-truth images in both computational efficiency and perceptual quality, while additionally providing practical advantages through its unique capacity for text-guided controllability.
翻译:本文提出LatentCSI,一种利用预训练潜在扩散模型从WiFi信道状态信息测量数据生成物理环境图像的新方法。与先前依赖生成对抗网络等复杂计算密集型技术的方案不同,本方法采用轻量级神经网络将CSI幅度直接映射至LDM的潜在空间。随后在LDM预训练解码器进行解码前,我们对潜在表示施加基于文本引导的LDM去噪扩散过程,最终获得高分辨率图像。该设计规避了像素空间图像生成的挑战,跳过了传统图像到图像流程中通常需要的显式图像编码阶段,实现了高效且高质量的图像合成。我们在两个数据集上验证了本方法:使用商用WiFi设备与相机自主采集的宽带CSI数据集,以及公开MM-Fi数据集的子集。实验结果表明,与直接在真实图像上训练的同等复杂度基线方法相比,LatentCSI在计算效率和感知质量方面均表现更优,同时其独特的文本引导可控性特性带来了额外的实用优势。