Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus before being domain adapted to the DementiaBank elderly and UASpeech dysarthric speech datasets. Experimental results suggest that hyper-parameter adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over parameter-only fine-tuning on DBank and UASpeech tasks respectively. An intuitive correlation is found between the performance improvements by hyper-parameter domain adaptation and the relative utterance length ratio between the source and target domain data.
翻译:自动识别非规范语音和老年语音至今仍因数据稀缺而极具挑战性。参数微调常被用于利用大量非老年及健康语音预训练模型,而神经架构超参数则依据专家知识设定且保持不变。本文研究了基于Librispeech语料库预训练的Conformer ASR系统在领域自适应至DementiaBank老年语音与UASpeech构音障碍语音数据集时的超参数自适应方法。实验结果表明,相较于仅进行参数微调,超参数自适应在DBank和UASpeech任务上分别降低了0.45%和0.67%的词错误率(WER)。研究发现,超参数领域自适应的性能提升与源域及目标域数据的相对话语长度比之间存在直观相关性。