The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from sixty-one different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use Advanced Voice Function Assessment Databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.
翻译:嗓音疾病的发病率逐年上升。利用软件进行远程诊断是技术发展趋势,具有重要的实用价值。导致声音嘶哑的常见嗓音疾病包括痉挛性发声障碍、声带麻痹、声带小结和声带息肉。本文提出了一种可广泛应用于临床的嗓音疾病检测方法。我们与中南大学湘雅医院合作,采集了61位不同患者的嗓音样本。提取梅尔频率倒谱系数(MFCC)参数作为输入特征,以数据形式描述嗓音。提出了一种结合MFCC参数和单卷积层CNN的创新模型,用于快速计算和分类。我们实现的最高准确率为92%,完全领先于原有研究成果,达到国际先进水平。我们还使用先进嗓音功能评估数据库(AVFAD)评估了所提方法的泛化能力,准确率达到了98%。在临床和标准数据集上的实验表明,该方法在嗓音疾病的病理检测中,在准确性和计算效率方面均有显著提升。