Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliable, privacy-preserving decision-support tools for clinical triage. We systematically compared multiple SLMs across diverse prompting pipelines and found that clinical vignettes, concise summaries of triage narratives, yielded the most accurate predictions. The SLM, Qwen2.5-7B, demonstrated the strongest balance of accuracy, stability, and computational efficiency. Through large-scale domain adaptation using expert-curated and silver-standard pediatric triage data, fine-tuned Qwen2.5-7B models substantially reduced discordance and clinically significant errors, outperforming all baseline SLMs and advanced proprietary large language models (LLMs, e.g., GPT-4o). These findings highlight the feasibility of institution-specific SLMs for reliable, privacy-preserving ESI decision support and underscore the importance of targeted fine-tuning over more complex inference strategies.
翻译:急诊科中准确且一致的急诊严重程度指数分配仍是一项持续性挑战,由于自由文本分诊记录的显著变异性,常导致误判及流程效率低下。本研究评估了开源小型语言模型是否可作为可靠且保护隐私的临床分诊决策支持工具。我们系统比较了多种小型语言模型在不同提示链中的表现,发现临床案例摘要(即分诊叙述的简洁总结)能产生最准确的预测结果。其中,Qwen2.5-7B模型在准确性、稳定性与计算效率之间展现出最佳平衡。通过使用专家标注及银标准儿科分诊数据进行大规模领域自适应,经微调的Qwen2.5-7B模型显著降低了分诊不一致率与临床显著错误,其性能优于所有基线小型语言模型及先进专有大语言模型(如GPT-4o)。这些发现凸显了机构专用小型语言模型在实现可靠、隐私保护的急诊严重程度指数决策支持中的可行性,并强调了针对性微调相较于复杂推理策略的重要性。