WWFedCBMIR: World-Wide Federated Content-Based Medical Image Retrieval

The paper proposes a Federated Content-Based Medical Image Retrieval (FedCBMIR) platform that utilizes Federated Learning (FL) to address the challenges of acquiring a diverse medical data set for training CBMIR models. CBMIR assists pathologists in diagnosing breast cancer more rapidly by identifying similar medical images and relevant patches in prior cases compared to traditional cancer detection methods. However, CBMIR in histopathology necessitates a pool of Whole Slide Images (WSIs) to train to extract an optimal embedding vector that leverages search engine performance, which may not be available in all centers. The strict regulations surrounding data sharing in medical data sets also hinder research and model development, making it difficult to collect a rich data set. The proposed FedCBMIR distributes the model to collaborative centers for training without sharing the data set, resulting in shorter training times than local training. FedCBMIR was evaluated in two experiments with three scenarios on BreaKHis and Camelyon17 (CAM17). The study shows that the FedCBMIR method increases the F1-Score (F1S) of each client to 98%, 96%, 94%, and 97% in the BreaKHis experiment with a generalized model of four magnifications and does so in 6.30 hours less time than total local training. FedCBMIR also achieves 98% accuracy with CAM17 in 2.49 hours less training time than local training, demonstrating that our FedCBMIR is both fast and accurate for both pathologists and engineers. In addition, our FedCBMIR provides similar images with higher magnification for non-developed countries where participate in the worldwide FedCBMIR with developed countries to facilitate mitosis measuring in breast cancer diagnosis. We evaluate this scenario by scattering BreaKHis into four centers with different magnifications.

翻译：本文提出了一种联邦式基于内容的医学图像检索（FedCBMIR）平台，该平台利用联邦学习（FL）来解决为训练CBMIR模型获取多样化医学数据集所面临的挑战。与传统癌症检测方法相比，CBMIR通过识别既往病例中相似的医学图像及相关病理切片，帮助病理学家更快速地诊断乳腺癌。然而，在组织病理学中，CBMIR需要一组全切片图像（WSI）来训练以提取最优嵌入向量，从而提升搜索引擎性能，但并非所有中心都能提供此类数据。医学数据集在数据共享方面的严格规定也阻碍了研究和模型开发，使得收集丰富的数据集变得困难。所提出的FedCBMIR将模型分发至协作中心进行训练，而无需共享数据集，从而比本地训练节省了训练时间。FedCBMIR在BreaKHis和Camelyon17（CAM17）数据集上，通过两种实验和三种场景进行了评估。研究表明，FedCBMIR方法在BreaKHis实验中，基于四种放大倍率的通用模型，将每个客户端的F1分数（F1S）分别提升至98%、96%、94%和97%，且训练时间比全部本地训练减少6.30小时。FedCBMIR在CAM17数据集上实现了98%的准确率，同时训练时间比本地训练减少2.49小时，证明我们的FedCBMIR对病理学家和工程师而言兼具快速性和准确性。此外，对于发展中国家，当它们与发达国家共同参与全球FedCBMIR平台时，FedCBMIR可提供更高放大倍率的相似图像，以辅助乳腺癌诊断中的有丝分裂测量。我们通过将BreaKHis数据集分散至四个不同放大倍率的中心来评估这一场景。