Agents represent one of the most emerging applications of Large Language Models (LLMs) and Generative AI, with their effectiveness hinging on multimodal capabilities to navigate complex user environments. Conversational Health Agents (CHAs), a prime example of this, are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence. This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support. It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses. Our implementation leverages the versatile openCHA framework, and our comprehensive evaluation involves neutral prompts expressed in diverse emotional tones: sadness, anger, and joy. We evaluate the consistency and repeatability of the planning capability of the proposed CHA. Furthermore, human evaluators critique the CHA's empathic delivery, with findings revealing a striking concordance between the CHA's outputs and evaluators' assessments. These results affirm the indispensable role of vocal (soon multimodal) emotion recognition in strengthening the empathetic connection built by CHAs, cementing their place at the forefront of interactive, compassionate digital health solutions.
翻译:大型语言模型(LLMs)与生成式AI最具前沿性的应用之一便是智能代理,其有效性取决于应对复杂用户环境的多模态能力。作为典型范例,会话式健康代理(CHAs)正通过超越文本分析、融合情感智能的精细化支持重新定义医疗保健领域。本文提出一种基于LLM的CHAs系统,专为丰富的多模态对话而设计——尤其在心理健康支持领域。该系统通过分析多模态线索精准解读并回应用户情感状态,从而生成兼具情境感知与共情共鸣的语音回应。其实现依托于灵活的开源框架openCHA,并通过具有多样化情绪基调(悲伤、愤怒、喜悦)的中性提示词展开全面评估。我们测试了所提出CHAs规划能力的一致性与可重复性。此外,人类评估人员对该CHAs的共情表达进行评判,结果揭示CHAs输出与评估者判断的高度一致性。这些研究结果证实了语音(未来将扩展至多模态)情感识别在强化CHAs建立的共情纽带中具有不可替代的作用,巩固了其在交互式、富有同理心的数字健康解决方案中的前沿地位。