Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a rise in contribution towards the ongoing research in the field of speech separation with computational efficiency at its core.
翻译:鸡尾酒会问题是指从多个说话人的混合语音中难以分离或区分单个说话人的场景。尽管该领域已有诸多研究,但模型的规模与复杂度通常需在语音分离的准确性及鲁棒性之间进行权衡。本文提出的"单通道多说话人语音分离"模型基于Transformer架构及其高效变体,使用包含多样说话人语音的LibriMix数据集进行训练。该模型能够从混合音频输入中分离出两个独立的说话人声源。所开发的模型在尽可能降低与现有主流语音分离模型性能差距的前提下,显著降低了语音分离模型的计算复杂度,并已朝着该目标取得实质性进展。本项工作预计将推动以计算效率为核心的语音分离研究的持续发展。