Financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision-making, including identifying money laundering and fraud. However, modern data privacy regulations impose restrictions on data sharing. For this reason, privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to determine individuals that hold accounts in both datasets. The names of account holders may be recorded differently in each dataset. We introduce a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures. The efficiency of the proposed scheme is enhanced using a clustering mechanism. Our scheme ensures privacy by only revealing the possibility of a potential match to the querying party. The practicality and effectiveness are evaluated using different datasets, and compared against state-of-the-art schemes. It takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively, meeting the requirements of financial institutions. Furthermore, it exhibits significant performance improvement in reducing communication overhead by 30-300 times.
翻译:金融机构在许多业务中依赖数据,包括提高运营效率、增强服务能力以及防范金融犯罪。机构内部或跨机构的数据共享能够促进快速、基于证据的决策,例如识别洗钱和欺诈行为。然而,现代数据隐私法规对数据共享施加了严格限制。因此,隐私增强技术正日益被采用,以使组织在确保合规的同时能够获取共享情报。本文研究了一种场景:由于监管限制,一方无法与另一方(内部或外部)共享目标账户数据以确定在两个数据集中均持有账户的个人。账户持有人的姓名在不同数据集中可能存在差异记录。我们提出了一种新颖的跨机构隐私保护模糊姓名匹配方案,该方案基于MinHash签名实现全同态加密。通过引入聚类机制,该方案的效率得到显著提升。我们的方案仅向查询方揭示潜在匹配的可能性,从而确保隐私安全。通过使用不同数据集评估方案的实用性与有效性,并与前沿方案进行对比。实验表明,从1万和10万个姓名中搜索1000个姓名分别耗时约100秒和1000秒,满足金融机构的实际需求。此外,该方案在降低通信开销方面表现出显著性能提升,通信量减少至原有方案的1/30至1/300。