This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
翻译:本报告探讨对人体受试者及生物样本的医学图像进行去标识化处理的技术层面问题,旨在将涉及伦理、道德及法律风险的再识别风险充分降低,从而使数据无论来源与分发地管辖权如何,均可不受限制地公开共享并用于任何目的。研究涵盖所有医学图像(无论其采集方式如何),但重点聚焦于伴随数据元素的图像,尤其是采用内嵌数据元素格式编码的图像(如医学数字成像和通信标准(DICOM)格式)。这些图像包含分段图像、参数图、放射治疗剂量对象等类图像对象。研究范围还包括相关非图像对象,如放射治疗结构集、计划与剂量体积直方图、结构化报告及显示状态。仅考虑公开释放数据的去标识化,而涉及隐私保护的其他替代方法(如用于人工智能模型开发的联邦学习)以及人工智能模型共享引发的隐私泄露问题均不在研究范围内。本报告仅针对公开共享的技术问题进行探讨。