Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substantial data collection. To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning. Our method employs a feature extractor trained with HuBERT to produce per-word prototypes that encapsulate the characteristics of previously unseen speakers. These prototypes serve as the basis for classification. Additionally, we incorporate supervised contrastive learning to refine feature extraction. By enhancing representation quality, we further improve DSR performance, enabling effective personalized DSR. We release our code at https://github.com/NKU-HLT/PB-DSR.
翻译:构音障碍语音识别(DSR)由于说话人间的固有变异性而面临严峻挑战,当将DSR模型应用于新的构音障碍说话人时,会导致性能严重下降。传统的说话人适配方法通常涉及为每个说话人微调模型,但该策略成本高昂且对残障用户不便,需要大量数据收集。为解决此问题,我们提出一种基于原型的方法,该方法无需额外微调即可显著提升未知构音障碍说话人的DSR性能。我们的方法采用通过HuBERT训练的特征提取器,生成能够封装先前未见说话人特征的每词原型。这些原型作为分类的基础。此外,我们引入监督对比学习以优化特征提取。通过提升表征质量,我们进一步改善了DSR性能,实现了有效的个性化DSR。我们在https://github.com/NKU-HLT/PB-DSR发布了代码。