We propose a data cleansing method that utilizes a neural analysis and synthesis (NANSY++) framework to train an end-to-end neural diarization model (EEND) for singer diarization. Our proposed model converts song data with choral singing which is commonly contained in popular music and unsuitable for generating a simulated dataset to the solo singing data. This cleansing is based on NANSY++, which is a framework trained to reconstruct an input non-overlapped audio signal. We exploit the pre-trained NANSY++ to convert choral singing into clean, non-overlapped audio. This cleansing process mitigates the mislabeling of choral singing to solo singing and helps the effective training of EEND models even when the majority of available song data contains choral singing sections. We experimentally evaluated the EEND model trained with a dataset using our proposed method using annotated popular duet songs. As a result, our proposed method improved 14.8 points in diarization error rate.
翻译:我们提出一种数据清洗方法,该方法利用神经分析与合成(NANSY++)框架来训练用于歌手分离的端到端神经分离模型(EEND)。我们提出的模型将流行音乐中常见且不适合生成模拟数据集的和声演唱歌曲数据转换为独唱数据。此清洗基于NANSY++框架,该框架经过训练以重构输入的非重叠音频信号。我们利用预训练的NANSY++将和声演唱转换为干净、非重叠的音频。此清洗过程减轻了将和声演唱误标记为独唱的问题,并有助于在大多数可用歌曲数据包含和声演唱部分的情况下有效训练EEND模型。我们使用带标注的流行二重唱歌曲,通过实验评估了使用我们提出的方法清洗后的数据集训练的EEND模型。结果表明,我们提出的方法将分离错误率降低了14.8个百分点。