Speech and language technologies offer valuable opportunities for supporting mental health assessment through objective and interpretable cues. We present a systematic feature-based analysis framework leveraging perceptually grounded acoustic and linguistic characteristics, including prosody, vocal quality, semantic coherence, syntactic structure, and sarcasm. Using statistical analysis and interpretable machine learning (XGBoost with SHAP and LIME), we examine associations between speech features and validated symptom measures of depression, anxiety, and ADHD. Evaluated on both controlled benchmark datasets (StressID, DAIC-WOZ, Androids, EATD) and a real-world clinical dataset, the framework reveals stable and consistent relationships between symptom severity and vocal irregularities (e.g., shimmer, jitter), lexical-syntactic patterns, and affective tone. An ablation study conducted across all datasets further identifies the most informative feature groups. This work explores a transparent and clinically interpretable approach to speech-based mental health analysis.
翻译:言语与语言技术通过客观且可解释的线索,为心理健康评估提供了重要机遇。我们提出了一套基于感知声学与语言学特征的系统性分析框架,涵盖韵律、嗓音质量、语义连贯性、句法结构及讽刺语气。通过统计分析及可解释机器学习(基于SHAP和LIME的XGBoost),我们探究了言语特征与抑郁症、焦虑症及注意力缺陷多动障碍(ADHD)已验证症状测量指标之间的关联。在受控基准数据集(StressID、DAIC-WOZ、Androids、EATD)及真实临床数据集上的评估表明,该框架揭示了症状严重程度与嗓音不规则性(如shimmer、jitter)、词汇-句法模式及情感基调之间的稳定一致关系。跨数据集的消融研究进一步识别出最具信息量的特征组。本工作探索了一种透明且具有临床可解释性的言语心理健康分析方法。