Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates but few considerations were made about their error resilience in a real system. Recently introduced Neural End-to-End Speech Codec (NESC) can reproduce high quality natural speech at low bitrates. We extend its robustness to packet losses by adding a low complexity network to predict the codebook indices in latent space. Furthermore, we propose a method to add an in-band FEC at an additional bitrate of 0.8 kbps. Both subjective and objective assessment indicate the effectiveness of proposed methods, and demonstrate that coupling PLC and FEC provide significant robustness against packet losses.
翻译:诸如丢包隐藏(PLC)和前向纠错(FEC)等抗误码工具对于维持语音通信的可靠性至关重要,尤其是在数据包频繁延迟和丢失的网络电话(VoIP)等应用中。近年来,端到端神经语音编解码器因其能在低比特率下传输语音信号而显著发展,但在实际系统中对其抗误码能力的考量却很少。最新提出的神经端到端语音编解码器(NESC)能够在低比特率下再现高质量的自然语音。我们通过添加一个低复杂度网络来预测潜在空间中的码本索引,从而增强了其对丢包的鲁棒性。此外,我们提出了一种方法,以额外0.8 kbps的比特率增加带内前向纠错(FEC)。主观和客观评估均表明所提方法的有效性,并证明将丢包隐藏(PLC)与前向纠错(FEC)相结合能够显著增强对数据包丢失的鲁棒性。