This article details the creation of a novel domain ontology at the intersection of epidemiology, medicine, statistics, and computer science. Using the terminology defined by current legislation, the article outlines a systematic approach to handling hospital data anonymously in preparation for its use in Artificial Intelligence (AI) applications in healthcare. The development process consisted of 7 pragmatic steps, including defining scope, selecting knowledge, reviewing important terms, constructing classes that describe designs used in epidemiological studies, machine learning paradigms, types of data and attributes, risks that anonymized data may be exposed to, privacy attacks, techniques to mitigate re-identification, privacy models, and metrics for measuring the effects of anonymization. The article concludes by demonstrating the practical implementation of this ontology in hospital settings for the development and validation of AI.
翻译:本文详述了一个位于流行病学、医学、统计学与计算机科学交叉领域的新型领域本体的创建。依据现行立法定义的术语,本文概述了一种系统性方法,用于以匿名方式处理医院数据,为其在医疗人工智能应用中的使用做准备。开发过程包含7个务实步骤,包括界定范围、选择知识、审查重要术语、构建描述流行病学研究设计、机器学习范式、数据类型与属性、匿名化数据可能面临的风险、隐私攻击、降低重识别风险的技术、隐私模型以及衡量匿名化效果的指标等类。文章最终展示了该本体在医院环境中用于人工智能开发与验证的实际应用。