Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering

Combating money laundering has become increasingly complex with the rise of cybercrime and digitalization of financial transactions. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by data silos within financial institutions, limiting collaboration and overall efficacy. This research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations are directly performed on encrypted data, ensuring the confidentiality of financial data. Notably, FHE over the Torus (TFHE) was integrated with graph-based machine learning using Zama Concrete ML. The research contributes two key privacy-preserving pipelines. First, the development of a privacy-preserving Graph Neural Network (GNN) pipeline was explored. Optimization techniques like quantization and pruning were used to render the GNN FHE-compatible. Second, a privacy-preserving graph-based XGBoost pipeline leveraging Graph Feature Preprocessor (GFP) was successfully developed. Experiments demonstrated strong predictive performance, with the XGBoost model consistently achieving over 99% accuracy, F1-score, precision, and recall on the balanced AML dataset in both unencrypted and FHE-encrypted inference settings. On the imbalanced dataset, the incorporation of graph-based features improved the F1-score by 8%. The research highlights the need to balance the trade-off between privacy and computational efficiency.

翻译：随着网络犯罪的兴起和金融交易的数字化，打击洗钱活动变得日益复杂。基于图的机器学习技术已成为反洗钱检测的有力工具，能够捕捉洗钱网络中的复杂关系。然而，金融机构内部的数据孤岛阻碍了反洗钱解决方案的有效性，限制了机构间的协作与整体效能。本研究提出了一种新颖的隐私保护协同反洗钱机器学习方法，能够在保护隐私和符合监管要求的前提下，促进跨机构、跨边界的金融数据安全共享。通过利用全同态加密技术，直接在加密数据上进行计算，确保了金融数据的机密性。值得注意的是，本研究将环面全同态加密与基于图的机器学习相结合，并采用了Zama Concrete ML框架。研究贡献了两条关键的隐私保护流程。首先，探索了隐私保护图神经网络流程的开发，通过量化和剪枝等优化技术使图神经网络兼容全同态加密。其次，成功开发了基于图特征预处理器的隐私保护图增强XGBoost流程。实验结果表明，在平衡的反洗钱数据集上，XGBoost模型在未加密和全同态加密推理设置中均持续实现了超过99%的准确率、F1分数、精确率和召回率。在不平衡数据集上，引入基于图的特征使F1分数提升了8%。本研究强调了在隐私保护与计算效率之间寻求平衡的重要性。