Toward Real-World Voice Disorder Classification

Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. Methods: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain invariant features. Results: The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. Significance: To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources.

翻译：目的：嗓音障碍严重损害患者日常生活中的言语交流能力。若缺乏早期诊断与治疗，这些病症可能急剧恶化。因此，对于无法获得临床疾病评估的人群，家庭环境中的自动分类系统具有重要价值。然而，临床数据与嘈杂真实数据之间的资源受限及领域不匹配问题，可能削弱此类系统的性能。方法：本研究开发了一个紧凑且具有领域鲁棒性的嗓音障碍分类系统，用于识别健康人群、肿瘤患者及良性结构性疾病患者的语音片段。所提系统采用由分解卷积神经网络构成的特征提取器，并通过领域对抗训练提取领域不变特征以弥合领域不匹配。结果：在嘈杂真实领域，未加权平均召回率提升13%，而临床领域仅出现轻微退化，维持80%的准确率。领域不匹配被有效消除。此外，系统内存与计算资源消耗降低超过73.9%。结论：通过部署分解卷积神经网络与领域对抗训练，在资源受限条件下可提取领域不变特征用于嗓音障碍分类。实验结果证实，所提系统能显著降低资源消耗，并通过考虑领域不匹配问题提升分类准确率。意义：据我们所知，这是首个在嗓音障碍分类中联合考虑真实场景模型压缩与噪声鲁棒性的研究。该系统旨在应用于资源受限的嵌入式设备。