Face recognition models operate in a client-server setting where a client extracts a compact face embedding and a server performs similarity search over a template database. This raises privacy concerns, as facial data is highly sensitive. To provide cryptographic privacy guarantees, one can use fully homomorphic encryption to perform end-to-end encrypted similarity search. However, existing FHE-based protocols are computationally costly and, impose high memory overhead. Building on prior work, HyDia (PoPETS 2025), we introduce algorithmic and system-level improvements targeting real-world deployment with resource-constrained clients. First, we propose BSGS-Diagonal, an algorithm delivering fast and memory-efficient similarity computation. BSGS-Diagonal substantially shrinks the rotation-key set, lowering both client and server memory requirements, and also improves practical server runtime. This yields a 91% reduction in the number of rotation keys, translating to approximately 14 GB less memory used on the client, and reducing overall CPU peak RAM from over 30 GB in the original HyDia to under 10 GB for databases up to size 1M. In addition, runtime is improved by up to 1.57x for the membership verification scenario and 1.43x for the identification scenario. Secondly, we introduce fully GPU-optimized similarity matrix computation kernels. The implementation is built upon FIDESlib, a CKKS-level GPU library based on OpenFHE. Rather than offloading individual CKKS primitives in isolation, the integrated kernels fuse operations to avoid repeated CPU-GPU ciphertext movement and costly FIDESlib/OpenFHE data-structure conversions. As a result, our GPU implementations of both HyDia and BSGS-Diagonal achieve up to 9x and 21x speedups, respectively, enabling sub-second encrypted face recognition for databases up to 32K entries while further reducing host memory usage.
翻译:人脸识别模型通常在客户端-服务器模式下运行:客户端提取紧凑人脸嵌入向量,服务器在模板数据库中进行相似性搜索。由于人脸数据高度敏感,这一过程引发隐私担忧。为提供加密级别的隐私保障,可采用全同态加密实现端到端加密相似性搜索。然而现有基于FHE的方案计算成本高昂且内存开销巨大。基于前期工作HyDia(PoPETS 2025),我们引入算法与系统层面的改进,针对资源受限客户端实现实际部署。首先提出BSGS-Diagonal算法,该算法实现快速且内存高效的相似性计算,通过大幅压缩旋转密钥集降低客户端与服务器内存需求,同时提升服务端实际运行效率。实验表明:旋转密钥数量减少91%,客户端内存占用降低约14GB;对于百万级数据库,CPU峰值内存从HyDia原始方案的30GB以上降至10GB以下。成员验证场景运行效率提升1.57倍,身份识别场景提升1.43倍。其次,我们提出全GPU优化的相似矩阵计算核。该实现基于FIDESlib(基于OpenFHE的CKKS级GPU库),通过集成核融合操作避免单个CKKS原语的独立卸载,消除重复的CPU-GPU密文传输及FIDESlib/OpenFHE数据结构转换开销。最终,HyDia与BSGS-Diagonal的GPU实现分别获得9倍与21倍加速,支持3.2万条以下加密数据库的亚秒级人脸识别,同时进一步降低主机内存使用。