While the performance of offline neural speech separation systems has been greatly advanced by the recent development of novel neural network architectures, there is typically an inevitable performance gap between the systems and their online variants. In this paper, we investigate how RNN-based offline neural speech separation systems can be changed into their online counterparts while mitigating the performance degradation. We decompose or reorganize the forward and backward RNN layers in a bidirectional RNN layer to form an online path and an offline path, which enables the model to perform both online and offline processing with a same set of model parameters. We further introduce two training strategies for improving the online model via either a pretrained offline model or a multitask training objective. Experiment results show that compared to the online models that are trained from scratch, the proposed layer decomposition and reorganization schemes and training strategies can effectively mitigate the performance gap between two RNN-based offline separation models and their online variants.
翻译:尽管近年来新型神经网络架构的发展极大地提升了离线神经语音分离系统的性能,但这类系统与其在线变体之间通常存在难以避免的性能差距。本文研究了如何将基于RNN的离线神经语音分离系统改造为在线系统,同时减轻性能退化问题。我们将双向RNN层中的前向与后向RNN层进行分解或重组,形成在线路径与离线路径,使模型能够使用同一组参数同时执行在线与离线处理。我们进一步提出两种训练策略:通过预训练离线模型或多任务训练目标来提升在线模型性能。实验结果表明,相较于从头训练的在线模型,所提出的层分解重组方案及训练策略能有效缩小两类基于RNN的离线分离模型与其在线变体之间的性能差距。