The latest large language models (LLMs) such as ChatGPT, exhibit strong capabilities in automated mental health analysis. However, existing relevant studies bear several limitations, including inadequate evaluations, lack of prompting strategies, and ignorance of exploring LLMs for explainability. To bridge these gaps, we comprehensively evaluate the mental health analysis and emotional reasoning ability of LLMs on 11 datasets across 5 tasks. We explore the effects of different prompting strategies with unsupervised and distantly supervised emotional information. Based on these prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions. We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations. We benchmark existing automatic evaluation metrics on this dataset to guide future related works. According to the results, ChatGPT shows strong in-context learning ability but still has a significant gap with advanced task-specific methods. Careful prompt engineering with emotional cues and expert-written few-shot examples can also effectively improve performance on mental health analysis. In addition, ChatGPT generates explanations that approach human performance, showing its great potential in explainable mental health analysis.
翻译:最新的大语言模型(如ChatGPT)在自动化心理健康分析中展现出强大能力。然而,现有相关研究存在若干局限性,包括评估不足、缺乏提示策略,以及未充分探索大语言模型的可解释性。为弥补这些不足,我们在5项任务的11个数据集上全面评估了大语言模型的心理健康分析与情感推理能力。我们探究了结合无监督和远程监督情感信息的不同提示策略效果,并基于这些提示指导大语言模型为每个决策生成解释,从而实现可解释的心理健康分析。通过严格的人工评估衡量生成解释的质量,我们构建了一个包含163条人工评估解释的新数据集,并在此数据集上对现有自动评估指标进行基准测试,以指导未来相关研究。结果表明:ChatGPT展现了强大的上下文学习能力,但与先进的特定任务方法仍存在显著差距;精心设计的包含情感线索和专家编写的小样本示例的提示工程可有效提升心理健康分析性能;此外,ChatGPT生成的解释接近人类水平,彰显了其在可解释心理健康分析中的巨大潜力。