Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

Swanand Ravindra Kadhe,Heiko Ludwig,Nathalie Baracaldo,Alan King,Yi Zhou,Keith Houck,Ambrish Rawat,Mark Purcell,Naoise Holohan,Mikio Takeuchi,Ryo Kawahara,Nir Drucker,Hayim Shaul,Eyal Kushnir,Omri Soceanu

from arxiv, Prize Winner in the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge

The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.

翻译：金融异常证据的有效检测需要多个拥有不同数据集的实体（如支付网络系统及其合作银行）协同合作。受监管与竞争因素限制，这些金融机构间的信任度有限。联邦学习虽能在数据按垂直或水平方式分布于各实体时支持协作模型训练，但在实际金融异常检测场景中，数据往往同时存在垂直与水平分区，导致现有联邦学习方法无法直接即插即用。我们提出的创新方案PV4FAD融合了全同态加密、安全多方计算、差分隐私与随机化技术，在训练过程中平衡隐私与准确性，并在模型部署阶段防御推理攻击。该方案通过全同态加密和安全多方计算实现输入隐私保护，利用差分隐私技术防御推理时攻击以保障输出隐私。具体而言，在诚实但好奇威胁模型下，银行无法获取支付网络系统交易中的敏感特征，支付网络系统也无法获知银行数据集的任何信息，仅能获取预测标签。我们还开发并分析了用于推理阶段输出隐私保护的差分隐私机制。该方案通过显著降低各银行的噪声水平，在满足分布式差分隐私约束的同时生成高效用模型。为确保高精度，方案采用集成模型（尤其是随机森林），利用集成方法降低方差、提升准确性的已知特性。本方案在美国隐私增强技术挑战赛第一阶段荣获二等奖。