Medical data employed in research frequently comprises sensitive patient health information (PHI), which is subject to rigorous legal frameworks such as the General Data Protection Regulation (GDPR) or the Health Insurance Portability and Accountability Act (HIPAA). Consequently, these types of data must be pseudonymized prior to utilisation, which presents a significant challenge for many researchers. Given the vast array of medical data, it is necessary to employ a variety of de-identification techniques. To facilitate the anonymization process for medical imaging data, we have developed an open-source tool that can be used to de-identify DICOM magnetic resonance images, computer tomography images, whole slide images and magnetic resonance twix raw data. Furthermore, the implementation of a neural network enables the removal of text within the images. The proposed tool automates an elaborate anonymization pipeline for multiple types of inputs, reducing the need for additional tools used for de-identification of imaging data. We make our code publicly available at https://github.com/code-lukas/medical_image_deidentification.
翻译:研究中使用的医学数据通常包含敏感的患者健康信息(PHI),这些信息受到《通用数据保护条例》(GDPR)或《健康保险携带和责任法案》(HIPAA)等严格法律框架的约束。因此,此类数据在使用前必须进行假名化处理,这对许多研究者构成了重大挑战。鉴于医学数据的多样性,需要采用多种去标识化技术。为简化医学影像数据的匿名化流程,我们开发了一款开源工具,可用于对DICOM磁共振图像、计算机断层扫描图像、全切片图像及磁共振Twix原始数据进行去标识化处理。此外,通过神经网络实现可移除图像中的文本信息。该工具为多类型输入实现了自动化匿名处理流程,减少了对其他影像数据去标识化工具的依赖。我们的代码已在https://github.com/code-lukas/medical_image_deidentification公开。