LLMs are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient communication. Existing NLP frameworks focus on reactively labeling empathy in doctors' responses but offer limited support for anticipatory modeling of empathy needs, especially in general health queries. We introduce the Empathy Applicability Framework (EAF), a theory-driven approach that classifies patient queries in terms of the applicability of emotional reactions and interpretations, based on clinical, contextual, and linguistic cues. We release a benchmark of real patient queries, dual-annotated by Humans and GPT-4o. In the subset with human consensus, we also observe substantial human-GPT alignment. To validate EAF, we train classifiers on human-labeled and GPT-only annotations to predict empathy applicability, achieving strong performance and outperforming the heuristic and zero-shot LLM baselines. Error analysis highlights persistent challenges: implicit distress, clinical-severity ambiguity, and contextual hardship, underscoring the need for multi-annotator modeling, clinician-in-the-loop calibration, and culturally diverse annotation. EAF provides a framework for identifying empathy needs before response generation, establishes a benchmark for anticipatory empathy modeling, and enables supporting empathetic communication in asynchronous healthcare.
翻译:大型语言模型正日益融入临床工作流程,但其通常缺乏临床共情能力,而这是医患有效沟通的关键要素。现有自然语言处理框架主要侧重于对医生回应中的共情进行反应式标注,而在共情需求的预见性建模方面支持有限,特别是在通用健康查询场景中。我们提出共情适用性框架,这是一种基于临床、语境和语言线索,从情感反应与解读的适用性角度对患者查询进行分类的理论驱动方法。我们发布了由人类与GPT-4o双重标注的真实患者查询基准数据集。在具有人类共识的数据子集中,我们观察到显著的人机标注一致性。为验证该框架,我们分别基于人工标注和纯GPT标注训练分类器来预测共情适用性,取得了优异性能并超越启发式方法和零样本大型语言模型基线。误差分析揭示了持续存在的挑战:隐性痛苦表达、临床严重性歧义及语境困境,这凸显了多标注者建模、临床医生参与校准以及文化多样性标注的必要性。该框架为回应生成前的共情需求识别提供了方法论,建立了预见性共情建模的基准,并为异步医疗场景中的共情沟通支持提供了实现路径。