This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
翻译:本报告针对人类受试者医学图像及生物样本的去标识化技术层面展开论述,旨在将伦理、道德及法律层面的再识别风险降至足够低的水平,以实现不受限制的公共共享(适用于任何目的),且不受数据来源地与分发地司法管辖区的限制。报告涵盖所有医学图像(无论采集模式如何),但重点聚焦于附带数据元素的图像,特别是那些数据元素嵌入在特定格式中的图像,尤其是医学数字成像与通信(DICOM)格式。这些图像包括类图像对象,如分割图、参数图及放射治疗(RT)剂量对象。研究范围还涵盖相关非图像对象,如RT结构集、计划与剂量体积直方图、结构化报告及呈现状态。报告仅考虑公开数据发布的去标识化问题,其他隐私保护方法(如用于人工智能模型开发的联邦学习)以及人工智能模型共享导致的隐私泄露问题均不在讨论范围内。本文仅探讨公共共享所涉及的技术问题。