We systematically assess the performance of three leading API-based de-identification systems - Azure Health Data Services, AWS Comprehend Medical, and OpenAI GPT-4o - against our de-identification systems on a ground truth dataset of 48 clinical documents annotated by medical experts. Our analysis, conducted at both entity-level and token-level, demonstrates that our solution, Healthcare NLP, achieves the highest accuracy, with a 96% F1-score in protected health information (PHI) detection, significantly outperforming Azure (91%), AWS (83%), and GPT-4o (79%). Beyond accuracy, Healthcare NLP is also the most cost-effective solution, reducing processing costs by over 80% compared to Azure and GPT-4o. Its fixed-cost local deployment model avoids the escalating per-request fees of cloud-based services, making it a scalable and economical choice. Our results underscore a critical limitation: zero-shot commercial APIs fail to meet the accuracy, adaptability, and cost-efficiency required for regulatory-grade clinical de-identification. Healthcare NLP's superior performance, customization capabilities, and economic advantages position it as the more viable solution for healthcare organizations seeking compliance and scalability in clinical NLP workflows.
翻译:我们系统评估了三种领先的基于API的去标识化系统——Azure Health Data Services、AWS Comprehend Medical和OpenAI GPT-4o——在由医学专家标注的48份临床文档真实数据集上的性能,并与我们的去标识化系统进行比较。我们在实体级别和标记级别进行的分析表明,我们的解决方案Healthcare NLP实现了最高的准确率,在受保护健康信息检测中获得了96%的F1分数,显著优于Azure(91%)、AWS(83%)和GPT-4o(79%)。除了准确性之外,Healthcare NLP也是最具成本效益的解决方案,与Azure和GPT-4o相比,处理成本降低了80%以上。其固定成本的本地部署模式避免了基于云服务的按请求递增费用,使其成为可扩展且经济的选择。我们的结果突显了一个关键局限:零样本商业API无法满足监管级临床去标识化所需的准确性、适应性和成本效益。Healthcare NLP的卓越性能、定制能力和经济优势,使其成为医疗保健机构在临床NLP工作流程中寻求合规性和可扩展性的更可行解决方案。