In recent years, there has been remarkable progress in the field of digital pathology, driven by the ability to model complex tissue patterns using advanced deep-learning algorithms. However, the robustness of these models is often severely compromised in the presence of data shifts (e.g., different stains, organs, centers, etc.). Alternatively, continual learning (CL) techniques aim to reduce the forgetting of past data when learning new data with distributional shift conditions. Specifically, rehearsal-based CL techniques, which store some past data in a buffer and then replay it with new data, have proven effective in medical image analysis tasks. However, privacy concerns arise as these approaches store past data, prompting the development of our novel Generative Latent Replay-based CL (GLRCL) approach. GLRCL captures the previous distribution through Gaussian Mixture Models instead of storing past samples, which are then utilized to generate features and perform latent replay with new data. We systematically evaluate our proposed framework under different shift conditions in histopathology data, including stain and organ shift. Our approach significantly outperforms popular buffer-free CL approaches and performs similarly to rehearsal-based CL approaches that require large buffers causing serious privacy violations.
翻译:近年来,得益于利用先进深度学习算法对复杂组织模式进行建模的能力,数字化病理学领域取得了显著进展。然而,在存在数据偏移(例如不同染色方案、器官、中心等)的情况下,这些模型的鲁棒性通常会受到严重损害。作为替代方案,持续学习技术旨在减少在具有分布偏移条件的新数据学习过程中对过去数据的遗忘。具体而言,基于复现的持续学习技术——将部分历史数据存储在缓冲区中,随后与新数据一起回放——已被证明在医学图像分析任务中行之有效。然而,由于这些方法需存储历史数据,引发了隐私方面的担忧,这促使我们开发了新型的基于生成式潜在回放的持续学习方法。该方法通过高斯混合模型捕捉先前的数据分布,而非直接存储历史样本,随后利用该模型生成特征并与新数据进行潜在回放。我们在组织病理学数据的不同偏移条件下(包括染色偏移与器官偏移)系统评估了所提出的框架。该方法显著优于主流的无缓冲区持续学习方法,其性能与需要大型缓冲区(会导致严重隐私侵犯)的基于复现的持续学习方法相当。