The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 97.0%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.
翻译:病理语音特征的检测通常被定义为二分类任务,其中一类代表特定病理特征,另一类代表健康语音。在本研究中,我们训练了神经网络、大间隔分类器和树增强机以区分四种病理类型:帕金森病、喉癌、唇腭裂和口腔鳞状细胞癌。研究表明,从预训练的wav2vec 2.0系统不同层提取的潜在表征可有效用于分类这些病理语音类型。我们通过向测试数据添加房间脉冲响应,并将分类器应用于未见的语音语料库,评估了其鲁棒性。根据所用模型和噪声条件,我们的方法实现了74.1%至97.0%的未加权平均F1分数。该系统具有良好的泛化能力,能够在来自不同来源的健康说话者未见数据上表现优异。