This technical report introduces a Named Clinical Entity Recognition Benchmark for evaluating language models in healthcare, addressing the crucial natural language processing (NLP) task of extracting structured information from clinical narratives to support applications like automated coding, clinical trial cohort identification, and clinical decision support. The leaderboard provides a standardized platform for assessing diverse language models, including encoder and decoder architectures, on their ability to identify and classify clinical entities across multiple medical domains. A curated collection of openly available clinical datasets is utilized, encompassing entities such as diseases, symptoms, medications, procedures, and laboratory measurements. Importantly, these entities are standardized according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, ensuring consistency and interoperability across different healthcare systems and datasets, and a comprehensive evaluation of model performance. Performance of models is primarily assessed using the F1-score, and it is complemented by various assessment modes to provide comprehensive insights into model performance. The report also includes a brief analysis of models evaluated to date, highlighting observed trends and limitations. By establishing this benchmarking framework, the leaderboard aims to promote transparency, facilitate comparative analyses, and drive innovation in clinical entity recognition tasks, addressing the need for robust evaluation methods in healthcare NLP.
翻译:本技术报告介绍了一个用于评估医疗领域语言模型的临床命名实体识别基准,旨在解决从临床叙述中提取结构化信息这一关键自然语言处理任务,以支持自动化编码、临床试验队列识别和临床决策支持等应用。该排行榜为评估各类语言模型(包括编码器和解码器架构)提供了标准化平台,重点关注模型在跨医学领域识别和分类临床实体的能力。研究采用精选的公开临床数据集集合,涵盖疾病、症状、药物、程序和实验室测量等多种实体类型。值得注意的是,这些实体均根据观察性医疗结果合作组织通用数据模型进行标准化处理,确保跨不同医疗系统和数据集的一致性及互操作性,并为模型性能提供全面评估。模型性能主要采用F1分数进行评估,并辅以多种评估模式以全面洞察模型表现。报告还对迄今评估的模型进行了简要分析,突出已观察到的趋势和局限性。通过建立这一基准测试框架,该排行榜旨在提升透明度、促进比较分析并推动临床实体识别任务的创新发展,以应对医疗自然语言处理领域对稳健评估方法的迫切需求。