SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.
翻译:SecureBoost是一种利用同态加密在纵向联邦学习场景中保护数据隐私的树提升算法。由于其可解释性、有效性及隐私保护能力,该算法广泛应用于金融和医疗等领域。然而,SecureBoost存在计算复杂度高和标签泄露风险的问题。为充分发挥SecureBoost的潜力,需要精心选择其超参数,以在效用、效率与隐私之间实现最优平衡。现有方法要么基于经验设定超参数,要么采用启发式策略,均远未达到最优水平。为填补这一空白,我们提出了一种约束多目标SecureBoost(CMOSB)算法,用于寻找帕累托最优解,每个解对应一组超参数,可在效用损失、训练成本与隐私泄露之间实现最优权衡。我们设计了三个目标的度量方法,其中隐私泄露采用我们提出的实例聚类攻击进行度量。实验结果表明,CMOSB不仅能够生成优于基线方法的超参数,还能得到支持联邦学习参与者灵活需求的最优超参数集合。