Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text

Large Language Models (LLMs) have demonstrated potential in predicting mental health outcomes from online text, yet traditional classification methods often lack interpretability and robustness. This study evaluates structured reasoning techniques-Chain-of-Thought (CoT), Self-Consistency (SC-CoT), and Tree-of-Thought (ToT)-to improve classification accuracy across multiple mental health datasets sourced from Reddit. We analyze reasoning-driven prompting strategies, including Zero-shot CoT and Few-shot CoT, using key performance metrics such as Balanced Accuracy, F1 score, and Sensitivity/Specificity. Our findings indicate that reasoning-enhanced techniques improve classification performance over direct prediction, particularly in complex cases. Compared to baselines such as Zero Shot non-CoT Prompting, and fine-tuned pre-trained transformers such as BERT and Mental-RoBerta, and fine-tuned Open Source LLMs such as Mental Alpaca and Mental-Flan-T5, reasoning-driven LLMs yield notable gains on datasets like Dreaddit (+0.52\% over M-LLM, +0.82\% over BERT) and SDCNL (+4.67\% over M-LLM, +2.17\% over BERT). However, performance declines in Depression Severity, and CSSRS predictions suggest dataset-specific limitations, likely due to our using a more extensive test set. Among prompting strategies, Few-shot CoT consistently outperforms others, reinforcing the effectiveness of reasoning-driven LLMs. Nonetheless, dataset variability highlights challenges in model reliability and interpretability. This study provides a comprehensive benchmark of reasoning-based LLM techniques for mental health text classification. It offers insights into their potential for scalable clinical applications while identifying key challenges for future improvements.

翻译：大型语言模型（LLMs）在基于在线文本预测心理健康结果方面展现出潜力，但传统分类方法往往缺乏可解释性和鲁棒性。本研究评估了结构化推理技术——思维链（CoT）、自洽性思维链（SC-CoT）和思维树（ToT）——以提升对来自Reddit的多个心理健康数据集的分类准确性。我们使用平衡准确率、F1分数和敏感度/特异度等关键性能指标，分析了包括零样本CoT和少样本CoT在内的推理驱动提示策略。研究结果表明，推理增强技术相较于直接预测能提升分类性能，尤其在复杂案例中。与基线方法（如零样本非CoT提示、微调的预训练Transformer模型（如BERT和Mental-RoBerta）以及微调的开源LLMs（如Mental Alpaca和Mental-Flan-T5））相比，推理驱动的LLMs在Dreaddit（较M-LLM提升0.52%，较BERT提升0.82%）和SDCNL（较M-LLM提升4.67%，较BERT提升2.17%）等数据集上取得了显著增益。然而，在抑郁严重度和CSSRS预测任务中性能有所下降，表明存在数据集特定的局限性，这可能源于我们使用了更广泛的测试集。在提示策略中，少样本CoT持续优于其他方法，进一步印证了推理驱动LLMs的有效性。尽管如此，数据集的变异性凸显了模型可靠性和可解释性方面的挑战。本研究为基于推理的LLM技术在心理健康文本分类领域提供了全面的基准测试，既揭示了其在可扩展临床应用中的潜力，也指出了未来改进的关键挑战。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日