In the day-to-day operations of healthcare institutions, a multitude of Personally Identifiable Information (PII) data exchanges occur, exposing the data to a spectrum of cybersecurity threats. This study introduces a federated learning framework, trained on the Wisconsin dataset, to mitigate challenges such as data scarcity and imbalance. Techniques like the Synthetic Minority Over-sampling Technique (SMOTE) are incorporated to bolster robustness, while isolation forests are employed to fortify the model against outliers. Catboost serves as the classification tool across all devices. The identification of optimal features for heightened accuracy is pursued through Principal Component Analysis (PCA),accentuating the significance of hyperparameter tuning, as underscored in a comparative analysis. The model exhibits an average accuracy of 99.95% on edge devices and 98% on the central server.
翻译:在医疗机构日常运营中,大量个人身份信息(PII)数据交换频繁发生,使数据面临一系列网络安全威胁。本研究引入一种基于威斯康星数据集训练的联邦学习框架,以缓解数据稀缺和不平衡等挑战。该框架采用合成少数过采样技术(SMOTE)增强鲁棒性,并利用孤立森林强化模型抵抗异常值。所有设备均采用Catboost作为分类工具。通过主成分分析(PCA)筛选最佳特征以提升准确率,同时强调超参数调优的重要性——对比分析进一步印证了这一点。该模型在边缘设备上平均准确率达99.95%,在中心服务器上为98%。