Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flipping increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed.
翻译:医疗保健是机器学习(ML)最重要的应用领域之一。传统上,ML模型由中央服务器训练,该服务器聚合来自各种分布式设备的数据以预测新生成数据的结果。这引发了一个主要问题,即模型可能访问敏感的用户信息,从而产生隐私担忧。联邦学习(FL)方法有助于解决这一问题:全局模型将其副本发送给所有客户端,这些客户端训练这些副本,并将更新(权重)发回给全局模型。随着时间的推移,全局模型不断改进并变得更加准确。由于训练在客户端设备本地进行,数据隐私在训练过程中得到保护。然而,全局模型容易受到数据投毒攻击。我们针对皮肤癌数据集开发了一种保护隐私的FL技术,并表明该模型易受数据投毒攻击。十个客户端训练该模型,但其中一个客户端故意引入标签翻转作为攻击方式。这降低了全局模型的准确率。随着标签翻转比例的增加,准确率出现显著下降。我们使用随机梯度下降优化算法为模型寻找最优准确率。尽管FL可以保护医疗诊断中的用户隐私,但它也容易受到数据投毒攻击,必须加以应对。