Modern financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision making, including identifying money laundering and fraud. However, data privacy regulations impose restrictions on data sharing. Privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to identify people that hold an account in each dataset. We observe that the names of account holders may be recorded differently in each data set. We introduce a novel privacy-preserving approach for fuzzy name matching across institutions, employing fully homomorphic encryption with locality-sensitive hashing. The efficiency of the approach is enhanced using a clustering mechanism. The practicality and effectiveness of the proposed approach are evaluated using different datasets. Experimental results demonstrate it takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively. Moreover, the proposed approach exhibits significant improvement in reducing communication overhead by 30-300 times, using clustering.
翻译:现代金融机构在许多业务中依赖数据,包括提升效率、改进服务和预防金融犯罪的需求。在组织内部或机构之间共享数据能够促进快速、基于证据的决策,包括识别洗钱和欺诈行为。然而,数据隐私法规对数据共享施加了限制。隐私增强技术正日益被采用,以使组织能够在确保合规的同时获得共享情报。本文研究了一种场景:监管限制意味着某一方无法与另一方(内部或外部)共享感兴趣账户的数据,以识别在每个数据集中都持有账户的人员。我们观察到,账户持有人的姓名在每个数据集中可能记录方式不同。我们提出了一种新颖的跨机构隐私保护模糊姓名匹配方法,该方法采用全同态加密与局部敏感哈希技术。通过聚类机制提升了该方法的效率。使用不同数据集评估了所提方法的实用性和有效性。实验结果表明,从1万和10万个姓名中搜索1000个姓名分别需要约100秒和1000秒。此外,所提方法通过聚类,在减少通信开销方面表现出显著改进,降低了30至300倍。