Following the recent success of Machine Learning tools in wireless communications, the idea of semantic communication by Weaver from 1949 has gained attention. It breaks with Shannon's classic design paradigm by aiming to transmit the meaning, i.e., semantics, of a message instead of its exact version, allowing for information rate savings. In this work, we apply the Stochastic Policy Gradient (SPG) to design a semantic communication system by reinforcement learning, separating transmitter and receiver, and not requiring a known or differentiable channel model -- a crucial step towards deployment in practice. Further, we derive the use of SPG for both classic and semantic communication from the maximization of the mutual information between received and target variables. Numerical results show that our approach achieves comparable performance to a model-aware approach based on the reparametrization trick, albeit with a decreased convergence rate.
翻译:继机器学习工具在无线通信领域取得成功之后,Weaver于1949年提出的语义通信理念重新获得关注。该理念突破了Shannon经典设计范式,旨在传输消息的含义(即语义)而非其精确版本,从而节省信息速率。本文应用随机策略梯度(SPG)通过强化学习设计语义通信系统,该系统将发射机和接收机解耦,无需已知或可微分的信道模型——这是向实际部署迈出的关键一步。进一步地,我们从接收变量与目标变量间互信息最大化的角度,推导出SPG在经典通信与语义通信中的使用。数值结果表明,尽管收敛速度有所降低,我们的方法仍能达到与基于重参数化技巧的模型感知方法相当的性能。