Objective: Emergency medical dispatch (EMD) is a high-stakes process challenged by caller distress, ambiguity, and cognitive load. Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers. This study aimed to develop and evaluate a taxonomy-grounded, LLM-powered multi-agent system for simulating realistic EMD scenarios. Methods: We constructed a clinical taxonomy (32 chief complaints, 6 caller identities from MIMIC-III) and a six-phase call protocol. Using this framework, we developed an AutoGen-based MAS with Caller and Dispatcher Agents. The system grounds interactions in a fact commons to ensure clinical plausibility and mitigate misinformation. We used a hybrid evaluation framework: four physicians assessed 100 simulated cases for "Guidance Efficacy" and "Dispatch Effectiveness," supplemented by automated linguistic analysis (sentiment, readability, politeness). Results: Human evaluation, with substantial inter-rater agreement (Gwe's AC1 > 0.70), confirmed the system's high performance. It demonstrated excellent Dispatch Effectiveness (e.g., 94 % contacting the correct potential other agents) and Guidance Efficacy (advice provided in 91 % of cases), both rated highly by physicians. Algorithmic metrics corroborated these findings, indicating a predominantly neutral affective profile (73.7 % neutral sentiment; 90.4 % neutral emotion), high readability (Flesch 80.9), and a consistently polite style (60.0 % polite; 0 % impolite). Conclusion: Our taxonomy-grounded MAS simulates diverse, clinically plausible dispatch scenarios with high fidelity. Findings support its use for dispatcher training, protocol evaluation, and as a foundation for real-time decision support. This work outlines a pathway for safely integrating advanced AI agents into emergency response workflows.
翻译:目的:紧急医疗调度(EMD)是一个高风险过程,面临着呼叫者情绪困扰、信息模糊和认知负荷等挑战。大型语言模型(LLM)与多智能体系统(MAS)为增强调度员能力提供了新机遇。本研究旨在开发并评估一种基于分类学、由LLM驱动的多智能体系统,用于模拟真实的EMD场景。方法:我们构建了包含32种主诉症状的临床分类学(源自MIMIC-III数据库的6类呼叫者身份)和六阶段呼叫协议框架。基于此框架,我们开发了采用AutoGen架构的MAS系统,包含呼叫者与调度员智能体。该系统将交互过程锚定在事实知识库中,以确保临床合理性并减少错误信息。我们采用混合评估框架:由四位医师对100个模拟案例进行"指导效能"与"调度效能"人工评估,并辅以自动化语言分析(情感分析、可读性检测、礼貌度分析)。结果:具有较高评分者间一致性(Gwet's AC1 > 0.70)的人工评估证实了系统的高性能表现。系统展现出卓越的调度效能(例如94%的案例成功联系正确潜在相关方)和指导效能(91%的案例提供有效建议),两项指标均获得医师高度评价。算法指标进一步佐证了这些发现:系统呈现中性主导的情感特征(73.7%中性情感;90.4%中性情绪)、高可读性(Flesch指数80.9)以及稳定的礼貌表达风格(60%礼貌用语;0%不礼貌用语)。结论:我们基于分类学的MAS系统能够高保真地模拟多样化且临床合理的调度场景。研究结果支持其应用于调度员培训、协议评估,并可作为实时决策支持系统的基础框架。这项工作为将先进人工智能代理安全整合到应急响应工作流程提供了可行路径。