Vertical Federated Learning (VFL) is a trending collaborative machine learning model training solution. Existing industrial frameworks employ secure multi-party computation techniques such as homomorphic encryption to ensure data security and privacy. Despite these efforts, studies have revealed that data leakage remains a risk in VFL due to the correlations between intermediate representations and raw data. Neural networks can accurately capture these correlations, allowing an adversary to reconstruct the data. This emphasizes the need for continued research into securing VFL systems. Our work shows that hashing is a promising solution to counter data reconstruction attacks. The one-way nature of hashing makes it difficult for an adversary to recover data from hash codes. However, implementing hashing in VFL presents new challenges, including vanishing gradients and information loss. To address these issues, we propose HashVFL, which integrates hashing and simultaneously achieves learnability, bit balance, and consistency. Experimental results indicate that HashVFL effectively maintains task performance while defending against data reconstruction attacks. It also brings additional benefits in reducing the degree of label leakage, mitigating adversarial attacks, and detecting abnormal inputs. We hope our work will inspire further research into the potential applications of HashVFL.
翻译:纵向联邦学习(VFL)是一种趋势性的协作机器学习模型训练方案。现有工业框架采用安全多方计算技术(如同态加密)来保障数据安全与隐私。然而研究表明,由于中间表示与原始数据之间存在相关性,VFL中仍存在数据泄露风险。神经网络能够精确捕捉这些相关性,使得攻击者可借此重建数据。这凸显了持续研究VFL系统安全性的必要性。我们的工作表明,哈希是应对数据重建攻击的有效方案——哈希的单向性使其难以从哈希码中恢复原始数据。但在VFL中实施哈希面临梯度消失和信息损失等新挑战。为此,我们提出HashVFL方法,该方法在整合哈希特性的同时实现了可学习性、比特平衡性与一致性。实验结果表明,HashVFL在防御数据重建攻击的同时有效维持了任务性能,并在降低标签泄露程度、缓解对抗攻击及检测异常输入方面带来额外优势。我们期望这项工作能够激发对HashVFL潜在应用的进一步研究。