In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model. Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments. To address this issue, we propose a novel training strategy, dynamic sample dropout (DSD), which considers previous best label assignments and evaluation metrics to exclude the samples that may negatively impact the learned label assignments during training. Additionally, we include layer-wise optimization (LO) to improve the performance by solving layer-decoupling. Our experiments showed that combining DSD and LO outperforms the baseline and solves excessive label assignment switching and layer-decoupling issues. The proposed DSD and LO approach is easy to implement, requires no extra training sets or steps, and shows generality to various speech separation tasks.
翻译:在监督式语音分离任务中,置换不变训练(PIT)通过选择最优置换更新模型,被广泛用于处理标签歧义问题。尽管取得了成功,但先前研究表明,PIT存在相邻训练轮次间标签分配过度切换的问题,阻碍了模型学习更优的标签分配。为解决该问题,我们提出了一种新型训练策略——动态样本丢弃(DSD),该方法通过考虑先前最优标签分配和评估指标,在训练过程中排除可能对已学习标签分配产生负面影响的样本。此外,我们引入了分层优化(LO)方法,通过解决层解耦问题来提升性能。实验表明,DSD与LO的结合优于基线方法,并有效解决了标签分配过度切换和层解耦问题。所提出的DSD与LO方法易于实现,无需额外训练集或训练步骤,且对多种语音分离任务具有通用性。