Privacy-Preserving Medical Image Classification through Deep Learning and Matrix Decomposition

Deep learning (DL)-based solutions have been extensively researched in the medical domain in recent years, enhancing the efficacy of diagnosis, planning, and treatment. Since the usage of health-related data is strictly regulated, processing medical records outside the hospital environment for developing and using DL models demands robust data protection measures. At the same time, it can be challenging to guarantee that a DL solution delivers a minimum level of performance when being trained on secured data, without being specifically designed for the given task. Our approach uses singular value decomposition (SVD) and principal component analysis (PCA) to obfuscate the medical images before employing them in the DL analysis. The capability of DL algorithms to extract relevant information from secured data is assessed on a task of angiographic view classification based on obfuscated frames. The security level is probed by simulated artificial intelligence (AI)-based reconstruction attacks, considering two threat actors with different prior knowledge of the targeted data. The degree of privacy is quantitatively measured using similarity indices. Although a trade-off between privacy and accuracy should be considered, the proposed technique allows for training the angiographic view classifier exclusively on secured data with satisfactory performance and with no computational overhead, model adaptation, or hyperparameter tuning. While the obfuscated medical image content is well protected against human perception, the hypothetical reconstruction attack proved that it is also difficult to recover the complete information of the original frames.

翻译：近年来，基于深度学习（DL）的解决方案在医学领域得到广泛研究，提升了诊断、规划及治疗的有效性。由于健康相关数据的使用受到严格监管，在医院环境之外处理医疗记录以开发和部署DL模型，必须采取强大的数据保护措施。同时，在确保DL模型在加密数据上训练时达到最低性能水平方面，面临挑战——且该模型并非针对特定任务专门设计。我们的方法利用奇异值分解（SVD）和主成分分析（PCA）对医学图像进行混淆处理，然后将其用于DL分析。通过基于混淆帧的血管造影视图分类任务，评估DL算法从加密数据中提取相关信息的能力。通过模拟基于人工智能（AI）的重构攻击来探测安全级别，考虑两类对目标数据具有不同先验知识的威胁行为者。利用相似性指数定量衡量隐私保护程度。尽管需权衡隐私保护与准确性，但所提出的技术能够仅依靠加密数据训练血管造影视图分类器，在保持满意性能的同时，无需额外计算开销、模型适配或超参数调优。混淆后的医学图像内容不仅能有效抵御人类视觉感知，假设性重构攻击也证明从原始帧中恢复完整信息十分困难。