Privacy-Preserving Product-Quantized Approximate Nearest Neighbor Search Framework for Large-scale Datasets via A Hybrid of Fully Homomorphic Encryption and Trusted Execution Environment

翻译：面向大规模数据集的隐私保护乘积量化近似最近邻搜索框架：基于全同态加密与可信执行环境的混合方法

Shozo Saeki,Minoru Kawahara,Hirohisa Aman

from arxiv, 15 pages, 4 figures

A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs). Vectors used for nearest-neighbor searches have richer information for similarity searches. This information leads to security risks, such as embedding inversion and membership attacks. Therefore, Privacy-Preserving Approximate Nearest-Neighbor (PP-ANN) approaches are necessary for highly confidential data. However, conventional PP-ANN approaches based on a Trusted Execution Environment (TEE) or Fully Homomorphic Encryption (FHE) do not achieve practical security or performance. Additionally, conventional approaches focus on the search process rather than database generation for nearest-neighbor. To address these issues, we propose a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework. PPPQ-ANN provides a multi-layered security structure for vectors based on a hybrid of FHE and TEE. Additionally, PPPQ-ANN minimizes FHE ciphertext computations by combining Product-Quantization (PQ) with optimized data packing. We demonstrate the performance of PPPQ-ANN on million-scale datasets. As a result, PPPQ-ANN achieves database generation in less than 2 hours and more than 50 QPS in a sequential search while preserving privacy. Therefore, PPPQ-ANN optimizes the trade-off between security and performance by utilizing a hybrid of FHE and TEE, achieving practical performance while preserving privacy.

翻译：最近邻搜索框架是涉及大型语言模型（LLMs）和视觉语言模型（VLMs）的各种应用中的基础工具。用于最近邻搜索的向量蕴含更丰富的相似性搜索信息。这些信息会带来安全风险，例如嵌入逆向攻击和成员推理攻击。因此，对于高度机密数据，隐私保护近似最近邻（PP-ANN）方法是必要的。然而，基于可信执行环境（TEE）或全同态加密（FHE）的传统PP-ANN方法无法实现实用的安全性或性能。此外，传统方法侧重于搜索过程而非最近邻的数据库生成。为解决这些问题，我们提出了一种隐私保护乘积量化近似最近邻（PPPQ-ANN）框架。PPPQ-ANN基于FHE与TEE的混合技术，为向量提供了多层安全结构。同时，PPPQ-ANN通过结合乘积量化（PQ）与优化的数据打包方法，最小化FHE密文计算量。我们在百万级数据集上展示了PPPQ-ANN的性能。实验结果表明，PPPQ-ANN在保护隐私的前提下，可在2小时内完成数据库生成，并在顺序搜索中实现超过50 QPS的查询速率。因此，PPPQ-ANN通过利用FHE与TEE的混合技术优化了安全性与性能之间的权衡，在保护隐私的同时实现了实用性能。