This paper presents a novel framework for quantitatively evaluating the interactive ChatGPT model in the context of suicidality assessment from social media posts, utilizing the University of Maryland Reddit suicidality dataset. We conduct a technical evaluation of ChatGPT's performance on this task using Zero-Shot and Few-Shot experiments and compare its results with those of two fine-tuned transformer-based models. Additionally, we investigate the impact of different temperature parameters on ChatGPT's response generation and discuss the optimal temperature based on the inconclusiveness rate of ChatGPT. Our results indicate that while ChatGPT attains considerable accuracy in this task, transformer-based models fine-tuned on human-annotated datasets exhibit superior performance. Moreover, our analysis sheds light on how adjusting the ChatGPT's hyperparameters can improve its ability to assist mental health professionals in this critical task.
翻译:本文提出一种新型框架,利用马里兰大学Reddit自杀风险数据集,对交互式ChatGPT模型在社交媒体帖子自杀风险评估场景中的表现进行定量评估。我们通过零样本与少样本实验对ChatGPT在该任务中的技术性能开展评估,并将其结果与两种基于Transformer微调模型进行对比。同时,探究不同温度参数对ChatGPT响应生成的影响,并基于其非结论性率讨论最优温度设定。研究结果表明:尽管ChatGPT在此任务中展现出较高准确率,但基于人工标注数据微调的Transformer模型仍表现更优。此外,我们的分析揭示了调整ChatGPT超参数如何提升其辅助心理健康专业人员完成这一关键任务的能力。