Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a rise in contribution towards the ongoing research in the field of speech separation with computational efficiency at its core.
翻译:鸡尾酒会问题是指从多个说话人的混合语音中难以分离或区分出单个说话人的场景。该领域已有若干研究,但模型的规模与复杂度常与语音分离的准确性和鲁棒性相互权衡。本文提出一种基于Transformer架构及其高效变体的“单声道多说话人语音分离”模型。该模型使用包含多样化说话人语音的LibriMix数据集进行训练,能够从混合音频输入中分离出两个独立的说话人声源。所开发的模型致力于降低语音分离模型的计算复杂度,同时与主流语音分离模型的性能损失达到最小化,并已在此目标上取得显著进展。本项目预期,以计算效率为核心的语音分离领域研究将获得更多贡献。