In this study, we aim to explore the effect of pre-trained conditional generative speech models for the first time on dysarthric speech due to Parkinson's disease recorded in an ideal/non-noisy condition. Considering one category of generative models, i.e., diffusion-based speech enhancement, these models are previously trained to learn the distribution of clean (i.e, recorded in a noise-free environment) typical speech signals. Therefore, we hypothesized that when being exposed to dysarthric speech they might remove the unseen atypical paralinguistic cues during the enhancement process. By considering the automatic dysarthric speech detection task, in this study, we experimentally show that during the enhancement process of dysarthric speech data recorded in an ideal non-noisy environment, some of the acoustic dysarthric speech cues are lost. Therefore such pre-trained models are not yet suitable in the context of dysarthric speech enhancement since they manipulate the pathological speech cues when they process clean dysarthric speech. Furthermore, we show that the removed acoustics cues by the enhancement models in the form of residue speech signal can provide complementary dysarthric cues when fused with the original input speech signal in the feature space.
翻译:本研究首次旨在探索预训练条件生成语音模型对在理想/无噪声条件下记录的帕金森病构音障碍语音的影响。考虑生成模型的一个类别,即基于扩散的语音增强,这些模型先前经过训练以学习干净(即在无噪声环境中记录的)典型语音信号的分布。因此,我们假设当它们处理构音障碍语音时,可能会在增强过程中移除未见过的非典型副语言线索。通过考虑自动构音障碍语音检测任务,本研究通过实验表明,在对理想无噪声环境下记录的构音障碍语音数据进行增强处理时,部分声学构音障碍语音线索会丢失。因此,此类预训练模型目前尚不适用于构音障碍语音增强的语境,因为它们在处理干净的构音障碍语音时会改变病理性的语音线索。此外,我们证明,增强模型以残差语音信号形式移除的声学线索,当在特征空间与原始输入语音信号融合时,能够提供互补的构音障碍线索。