Intent is defined for understanding spoken language in existing works. Both textual features and acoustic features involved in medical speech contain intent, which is important for symptomatic diagnosis. In this paper, we propose a medical speech classification model named DRSC that automatically learns to disentangle intent and content representations from textual-acoustic data for classification. The intent representations of the text domain and the Mel-spectrogram domain are extracted via intent encoders, and then the reconstructed text feature and the Mel-spectrogram feature are obtained through two exchanges. After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification. Experimental results show that our model obtains an average accuracy rate of 95% in detecting 25 different medical symptoms.
翻译:现有工作将意图定义为理解口语的基础。医疗语音中的文本特征和声学特征均包含意图,这对症状诊断至关重要。本文提出一种名为DRSC的医疗语音分类模型,该模型能自动从文本-声学数据中解耦意图与内容表征以进行分类。通过意图编码器分别提取文本域和梅尔频谱域的意图表征,再经两次交换获得重构后的文本特征与梅尔频谱特征。将两个域的意图融合为联合表征后,集成后的意图表征输入决策层进行分类。实验结果表明,该模型在检测25种不同医疗症状时平均准确率达到95%。