In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the simulation results, the proposed DeepSC-ST significantly outperforms conventional communication systems and existing DL-enabled communication systems, especially in the low signal-to-noise ratio (SNR) regime. A software demonstration is further developed as a proof-of-concept of the DeepSC-ST.
翻译:本文提出了一种基于深度学习的语音传输语义通信系统,命名为DeepSC-ST。我们分别将语音识别和语音合成作为通信系统的传输任务。首先,通过联合语义通道编码器提取与语音识别相关的语义特征进行传输,接收端基于接收到的语义特征恢复文本,从而在不降低性能的前提下显著减少所需的数据传输量。随后,在接收端执行语音合成,通过将识别出的文本和说话人信息输入神经网络模块,致力于重新生成语音信号。为使DeepSC-ST能够适应动态信道环境,我们识别出一种稳健模型以应对不同信道条件。仿真结果表明,所提出的DeepSC-ST性能显著优于传统通信系统及现有基于深度学习的通信系统,尤其在低信噪比区域。进一步开发了软件演示以验证DeepSC-ST的概念可行性。