Patient datasets contain confidential information which is protected by laws and regulations such as HIPAA and GDPR. Ensuring comprehensive patient information necessitates privacy-preserving entity resolution (PPER), which identifies identical patient entities across multiple databases from different healthcare organizations while maintaining data privacy. Existing methods often lack cryptographic security or are computationally impractical for real-world datasets. We introduce a PPER pipeline based on AMPPERE, a secure abstract computation model utilizing cryptographic tools like homomorphic encryption. Our tailored approach incorporates extensive parallelization techniques and optimal parameters specifically for patient datasets. Experimental results demonstrate the proposed method's effectiveness in terms of accuracy and efficiency compared to various baselines.
翻译:患者数据集包含受《健康保险流通与责任法案》及《通用数据保护条例》等法律法规保护的机密信息。为确保患者信息的完整性,需采用隐私保护实体解析技术,该技术能在维护数据隐私的前提下,跨不同医疗机构的多个数据库识别同一患者实体。现有方法常缺乏密码学安全性,或在实际数据集上计算效率低下。本文提出基于AMPPERE的隐私保护实体解析流程,该安全抽象计算模型采用同态加密等密码学工具。我们针对患者数据集定制了包含大规模并行化技术与最优参数的方法。实验结果表明,相较于多种基线方法,所提方案在准确性与效率方面均展现出显著优势。