Static tools like the Patient Health Questionnaire-9 (PHQ-9) effectively screen depression but lack interactivity and adaptability. We developed HopeBot, a chatbot powered by a large language model (LLM) that administers the PHQ-9 using retrieval-augmented generation and real-time clarification. In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbot versions. Scores demonstrated strong agreement (ICC = 0.91; 45% identical). Among 75 participants providing comparative feedback, 71% reported greater trust in the chatbot, highlighting clearer structure, interpretive guidance, and a supportive tone. Mean ratings (0-10) were 8.4 for comfort, 7.7 for voice clarity, 7.6 for handling sensitive topics, and 7.4 for recommendation helpfulness; the latter varied significantly by employment status and prior mental-health service use (p < 0.05). Overall, 87.1% expressed willingness to reuse or recommend HopeBot. These findings demonstrate voice-based LLM chatbots can feasibly serve as scalable, low-burden adjuncts for routine depression screening.
翻译:患者健康问卷-9(PHQ-9)等静态工具虽能有效筛查抑郁,但缺乏交互性与适应性。我们开发了HopeBot,一种基于大语言模型(LLM)的聊天机器人,它通过检索增强生成和实时澄清技术来实施PHQ-9问卷。在一项被试内研究中,来自英国和中国的132名成年人分别完成了自填版和聊天机器人版问卷。评分显示出高度一致性(ICC = 0.91;45%完全一致)。在提供对比反馈的75名参与者中,71%表示对聊天机器人更为信任,并特别指出其结构更清晰、提供解释性指导且语气更具支持性。平均评分(0-10分)显示:舒适度8.4分,语音清晰度7.7分,敏感话题处理能力7.6分,推荐帮助性7.4分;其中推荐帮助性评分因就业状况和既往心理健康服务使用情况存在显著差异(p < 0.05)。总体而言,87.1%的参与者表示愿意再次使用或推荐HopeBot。这些发现表明,基于语音的LLM聊天机器人可作为一种可扩展、低负担的辅助工具,用于常规抑郁筛查。