We study the problem of secure joint source-channel coding for multimodal semantic sources transmitted over noisy wiretap channels. The source model consists of $m$ modalities (e.g., image, audio, and sensor data), all represented as random variables. The encoder observes independent and identically distributed samples of an arbitrary non-empty subset of modalities. The samples are encoded and transmitted over a discrete memoryless wiretap channel. The legitimate receiver reconstructs all modalities. We extend the rate-distortion-perception problem formulation to multimodal sources. We establish converse and achievability bounds on the fundamental limits of transmission rate, fidelity, and secrecy, under per-modality distortion and perception constraints, and per-subset equivocation constraints. We show that the fundamental limit for secrecy consists of three operationally distinct components: the level of compression, the secret key rate, and the statistics of the wiretap channel.
翻译:我们研究了在噪声窃听信道上传输多模态语义源的安全联合信源信道编码问题。源模型由$m$种模态(如图像、音频和传感器数据)组成,每种模态均表示为随机变量。编码器观测任意非空模态子集的独立同分布样本,并对这些样本进行编码后通过离散无记忆窃听信道传输。合法接收端需重建所有模态。我们将率失真感知问题框架扩展至多模态源,在满足各模态失真与感知约束以及各子集等效性约束的条件下,建立了关于传输速率、保真度与安全性的基本极限的逆界与可达界。研究表明,安全性的基本极限包含三个操作上不同的组成部分:压缩水平、密钥速率以及窃听信道的统计特性。