Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted acoustic features and machine learning classifiers such as SVM, MLP, and XGBoost. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.61% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect.
翻译:构音障碍语音的自动评估对于持续治疗和康复至关重要。然而,获取非典型语音具有挑战性,常导致数据稀缺问题。为解决该问题,我们提出了一种新型的构音障碍语音自动严重程度评估方法,该方法结合了自监督模型与多任务学习。Wav2vec 2.0 XLS-R被联合训练用于两个不同任务:严重程度分类和辅助自动语音识别(ASR)。在基线实验中,我们采用了手工设计的声学特征和机器学习分类器,如SVM、MLP和XGBoost。在韩国构音障碍语音QoLT数据库上的实验表明,我们的模型优于传统基线方法,F1分数相对提升1.25%。此外,所提模型超越了未使用ASR头部训练的模型,实现了10.61%的相对改进。进一步地,通过分析潜在表征和正则化效应,我们阐释了多任务学习如何影响严重程度分类性能。