This study presents a novel transfer learning approach and data augmentation technique for mental stability classification using human voice signals and addresses the challenges associated with limited data availability. Convolutional neural networks (CNNs) have been employed to analyse spectrogram images generated from voice recordings. Three CNN architectures, VGG16, InceptionV3, and DenseNet121, were evaluated across three experimental phases: training on non-augmented data, augmented data, and transfer learning. This proposed transfer learning approach involves pre-training models on the augmented dataset and fine-tuning them on the non-augmented dataset while ensuring strict data separation to prevent data leakage. The results demonstrate significant improvements in classification performance compared to the baseline approach. Among three CNN architectures, DenseNet121 achieved the highest accuracy of 94% and an AUC score of 99% using the proposed transfer learning approach. This finding highlights the effectiveness of combining data augmentation and transfer learning to enhance CNN-based classification of mental stability using voice spectrograms, offering a promising non-invasive tool for mental health diagnostics.
翻译:本研究提出了一种新颖的迁移学习方法和数据增强技术,用于利用人类语音信号进行心理稳定性分类,并解决了与数据可用性有限相关的挑战。研究采用卷积神经网络(CNN)来分析由语音录音生成的声谱图图像。在三个实验阶段评估了三种CNN架构(VGG16、InceptionV3和DenseNet121):在非增强数据、增强数据以及迁移学习上的训练。所提出的迁移学习方法包括在增强数据集上对模型进行预训练,然后在非增强数据集上进行微调,同时确保严格的数据分离以防止数据泄露。结果表明,与基线方法相比,分类性能有显著提升。在三种CNN架构中,DenseNet121采用所提出的迁移学习方法取得了最高准确率(94%)和AUC分数(99%)。这一发现凸显了结合数据增强和迁移学习在利用语音声谱图进行基于CNN的心理稳定性分类方面的有效性,为心理健康诊断提供了一种有前景的非侵入性工具。