Large language models (LLM) have been successful in several natural language understanding tasks and could be relevant for natural language processing (NLP)-based mental health application research. In this work, we report the performance of LLM-based ChatGPT (with gpt-3.5-turbo backend) in three text-based mental health classification tasks: stress detection (2-class classification), depression detection (2-class classification), and suicidality detection (5-class classification). We obtained annotated social media posts for the three classification tasks from public datasets. Then ChatGPT API classified the social media posts with an input prompt for classification. We obtained F1 scores of 0.73, 0.86, and 0.37 for stress detection, depression detection, and suicidality detection, respectively. A baseline model that always predicted the dominant class resulted in F1 scores of 0.35, 0.60, and 0.19. The zero-shot classification accuracy obtained with ChatGPT indicates a potential use of language models for mental health classification tasks.
翻译:大型语言模型(LLM)已在多项自然语言理解任务中取得成功,并可能对基于自然语言处理(NLP)的精神健康应用研究具有相关性。本研究报告了基于LLM的ChatGPT(采用gpt-3.5-turbo后端)在三个基于文本的精神健康分类任务中的表现:压力检测(二分类)、抑郁检测(二分类)和自杀倾向检测(五分类)。我们从公开数据集中获取了这三个分类任务的标注社交媒体帖子,随后通过ChatGPT API输入分类提示词对社交媒体帖子进行分类。我们得到的F1分数分别为:压力检测0.73、抑郁检测0.86、自杀倾向检测0.37。始终预测主导类别的基线模型对应的F1分数分别为0.35、0.60和0.19。ChatGPT的零样本分类准确率表明,语言模型在精神健康分类任务中具有潜在应用价值。