This paper proposes a novel confidence score guided incremental and speaker adaptive pseudo-labeling approach for semi-supervised elderly speech recognition. It facilitates higher-quality pseudo-label selection and progressive refinement, while also mitigating speaker heterogeneity. A confidence estimation module is designed to rank the reliability of untranscribed data, enabling a curriculum learning trajectory that progressively folds in unlabeled data subsets from high to low confidence. Speaker-specific characteristics are captured through speaker adaptive training with learnable prompts. Experiments on the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets suggest that the proposed method outperforms the semi-supervised baseline using no confidence scores guided incremental or speaker adaptive pseudo-labeling by statistically significant word error rate (WER) or character error rate (CER) reductions of 1.45% and 2.27% absolute (6.21% and 6.98% relative).
翻译:本文提出了一种新颖的置信度分数引导的增量式及说话人自适应伪标注方法,用于半监督老年语音识别。该方法能够促进更高质量的伪标签选择与渐进式优化,同时缓解说话人异质性问题。通过设计置信度估计模块对未转录数据的可靠性进行排序,实现了从高置信度到低置信度逐步纳入未标注数据子集的课程学习轨迹。借助可学习提示的说话人自适应训练,捕捉了说话人特异性特征。在英语DementiaBank Pitt数据集和粤语JCCOCC MoCA老年语音数据集上的实验表明,所提方法相较于未采用置信度分数引导的增量式或说话人自适应伪标注的半监督基线,实现了统计显著的词错误率(WER)和字符错误率(CER)绝对降低1.45%和2.27%(相对降低6.21%和6.98%)。