Client Recruitment for Federated Learning in ICU Length of Stay Prediction

Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.

翻译：近年来，机器学习和深度学习方法在医疗健康应用领域取得了显著进展和性能提升。这些方法需要海量训练数据，而医疗领域虽拥有此类数据，却呈分散化分布。医疗机构生成的大量数据因数据和隐私法规而难以共享和集中。联邦学习技术非常适合解决这些挑战。然而，联邦学习也带来了一系列新的开放性问题，涉及通信开销、高效参数聚合、客户选择策略等。在本研究中，我们聚焦于联邦网络启动模型训练前的关键步骤——客户招募。通过智能招募客户，可在不牺牲预测性能的前提下降低通信开销和整体训练成本。客户招募旨在基于一组表征客户对联邦潜在贡献的指标，预先排除某些客户参与联邦。本文提出一种仅利用客户站点输出分布和样本量的客户招募方法。我们证明，在不牺牲模型性能的同时，通过招募子集客户可显著提升计算效率。通过将招募方法应用于基于189个重症监护室数据的精准患者住院时长预测联邦模型训练，我们展示了由招募客户构成的联邦训练的模型在预测能力和训练时间上均显著优于采用标准流程训练的联邦模型。