In the era of big data and rapidly evolving information systems, efficient and accurate data retrieval has become increasingly crucial. Neural graph databases (NGDBs) have emerged as a powerful paradigm that combines the strengths of graph databases (graph DBs) and neural networks to enable efficient storage, retrieval, and analysis of graph-structured data. The usage of neural embedding storage and complex neural logical query answering provides NGDBs with generalization ability. When the graph is incomplete, by extracting latent patterns and representations, neural graph databases can fill gaps in the graph structure, revealing hidden relationships and enabling accurate query answering. Nevertheless, this capability comes with inherent trade-offs, as it introduces additional privacy risks to the database. Malicious attackers can infer more sensitive information in the database using well-designed combinatorial queries, such as by comparing the answer sets of where Turing Award winners born before 1950 and after 1940 lived, the living places of Turing Award winner Hinton are probably exposed, although the living places may have been deleted in the training due to the privacy concerns. In this work, inspired by the privacy protection in graph embeddings, we propose a privacy-preserving neural graph database (P-NGDB) to alleviate the risks of privacy leakage in NGDBs. We introduce adversarial training techniques in the training stage to force the NGDBs to generate indistinguishable answers when queried with private information, enhancing the difficulty of inferring sensitive information through combinations of multiple innocuous queries. Extensive experiment results on three datasets show that P-NGDB can effectively protect private information in the graph database while delivering high-quality public answers responses to queries.
翻译:在大数据和信息系统快速演变的时代,高效准确的数据检索变得至关重要。神经图数据库(NGDBs)作为一种结合了图数据库(graph DBs)和神经网络优势的强大范式而兴起,能够实现对图结构数据的高效存储、检索和分析。通过使用神经嵌入存储和复杂的神经逻辑查询回答,NGDBs具备了泛化能力。当图不完整时,神经图数据库能够提取潜在模式和表征,填补图结构中的空白,揭示隐藏关系,从而实现精确的查询回答。然而,这种能力伴随着固有的权衡,因为它给数据库带来了额外的隐私风险。恶意攻击者可以利用精心设计的组合查询推断数据库中更多敏感信息,例如通过比较图灵奖得主中出生在1950年前但与1940年后出生者的居住地,就可能暴露图灵奖得主Hinton的居住地——尽管出于隐私考虑,该居住地可能在训练过程中已被删除。受图嵌入中隐私保护的启发,本研究提出了一种隐私保护的神经图数据库(P-NGDB),以缓解NGDBs中的隐私泄露风险。我们在训练阶段引入对抗训练技术,迫使NGDBs在查询涉及隐私信息时生成难以区分的答案,从而增强通过多个无害查询组合推断敏感信息的难度。在三个数据集上的大量实验结果表明,P-NGDB能够有效保护图数据库中的隐私信息,同时保持对查询的高质量公开回答。