SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage.
翻译:SecureBoost是一种利用同态加密(HE)在纵向联邦学习中保护数据隐私的树提升算法。SecureBoost及其变体在金融和医疗等领域得到了广泛应用。然而,SecureBoost的超参数通常采用启发式方法进行配置,仅针对模型性能(即效用)进行优化,并假设隐私已得到保障。本研究发现SecureBoost及其部分变体仍容易遭受标签泄露攻击。这一漏洞可能导致当前启发式的SecureBoost超参数配置在效用、隐私和效率之间陷入次优折中——而这三个要素是构建可信联邦学习系统的关键。为解决该问题,我们提出约束多目标SecureBoost(CMOSB)算法,旨在逼近帕累托最优解集,其中每个解都对应一组超参数,可实现效用损失、训练成本和隐私泄露之间的最优平衡。我们设计了三个目标的度量方法,包括一种名为实例聚类攻击(ICA)的新型标签推断攻击来量化SecureBoost的隐私泄露程度。此外,我们提供了两种针对ICA的防御措施。实验结果表明,在效用损失、训练成本和隐私泄露的平衡方面,CMOSB方法能生成优于网格搜索和贝叶斯优化的超参数配置。