Automated mental health analysis shows great potential for enhancing the efficiency and accessibility of mental health care, whereas the recent dominant methods utilized pre-trained language models (PLMs) as the backbone and incorporated emotional information. The latest large language models (LLMs), such as ChatGPT, exhibit dramatic capabilities on diverse natural language processing tasks. However, existing studies on ChatGPT's zero-shot performance for mental health analysis have limitations in inadequate evaluation, utilization of emotional information, and explainability of methods. In this work, we comprehensively evaluate the mental health analysis and emotional reasoning ability of ChatGPT on 11 datasets across 5 tasks, including binary and multi-class mental health condition detection, cause/factor detection of mental health conditions, emotion recognition in conversations, and causal emotion entailment. We empirically analyze the impact of different prompting strategies with emotional cues on ChatGPT's mental health analysis ability and explainability. Experimental results show that ChatGPT outperforms traditional neural network methods but still has a significant gap with advanced task-specific methods. The qualitative analysis shows its potential in explainability compared with advanced black-box methods but also limitations on robustness and inaccurate reasoning. Prompt engineering with emotional cues is found to be effective in improving its performance on mental health analysis but requires the proper way of emotion infusion.
翻译:自动化心理健康分析在提升心理医疗服务的效率与可及性方面展现出巨大潜力,而近年主流方法均以预训练语言模型(PLMs)为骨干网络,并融入情感信息。最新的大语言模型(LLMs),如ChatGPT,在各类自然语言处理任务中展现出卓越能力。然而,现有关于ChatGPT在心理健康分析中零样本性能的研究存在评估不充分、情感信息利用不足及方法可解释性欠缺等问题。本研究在涵盖二元与多类心理健康状况检测、心理健康状况成因/因素检测、对话情感识别及因果情感蕴涵等5项任务的11个数据集上,全面评估了ChatGPT的心理健康分析与情感推理能力。我们通过实验分析发现,采用不同情感线索提示策略对ChatGPT的心理健康分析能力及其可解释性具有显著影响。实验结果表明,ChatGPT虽优于传统神经网络方法,但与先进的任务专用方法仍存在显著差距。定性分析显示其相较先进黑盒方法在可解释性方面具有潜力,但在鲁棒性与推理准确性方面仍存在局限。研究发现,引入情感线索的提示工程能有效提升心理健康分析性能,但需采用恰当的情感注入方式。