In spite of the recent success of deep learning in the medical domain, the problem of data scarcity in the medical domain gets aggravated due to privacy and data ownership issues. Distributed learning approaches including federated learning have been studied to alleviate the problems, but they suffer from cumbersome communication overheads and weakness in privacy protection. To address this, here we propose a self-supervised masked sampling distillation method for vision transformer that can be performed without continuous communication but still enhance privacy using a vision transformer-specific encryption method. The effectiveness of our method is demonstrated with extensive experiments on two medical domain data and two different downstream tasks, showing superior performances than those obtained with the existing distributed learning strategy as well as the fine-tuning only baseline. As the self-supervised model built with the proposed method is capable of having a general semantic understanding of the modality, we demonstrate its potential as a task-agnostic foundation model for various medical tasks, widening the applicability in the medical domain.
翻译:尽管深度学习在医学领域近期取得了成功,但由于隐私和数据所有权问题,医学数据稀缺的问题愈发严峻。联邦学习等分布式学习方法已被研究用于缓解这些问题,但存在通信开销大、隐私保护薄弱等缺陷。为解决这一挑战,本文提出一种面向视觉Transformer的自监督掩码采样蒸馏方法,该方法无需持续通信即可执行,并利用视觉Transformer专属加密技术增强隐私保护。我们在两组医学领域数据和两项不同下游任务上进行了广泛实验,验证了该方法的有效性。结果表明,其性能优于现有分布式学习策略及仅微调基线方法。由于采用所提方法构建的自监督模型能够对模态具有通用语义理解能力,我们展示了其作为任务无关的基础模型在各类医学任务中的潜力,从而拓展了其在医学领域的应用范围。