Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning but incurs extreme computational and memory overhead. These costs come not only from expensive low-level primitives, including Number Theoretic Transform (NTT), rotation, and key-switching, but also from inefficient ciphertext packing at the application level. Existing packing strategies typically preserve either neighboring data elements or feature grouping, but not both, leading to wasted ciphertext slots, excessive rotations, and inflated ciphertext counts. We propose FEnc2, a unified and principled fragment-based encoding framework for CKKS-based private convolutional neural network inference. FEnc2 optimizes slot utilization, rotation complexity, and ciphertext density through two components: 1)Conv-aware Encoding, which analytically selects an optimal fragment size to decouple spatial dependencies and jointly minimize inner-outer rotations across layers, and 2)Arch-aware Ct Compression, which restores ciphertext density after feature- or channel-reduction layers. Together, these transformations reshape encrypted workload structure and reduce homomorphic operations by one to two orders of magnitude. With full memory capacity utilized, i.e., at maximum batch size, FEnc2 achieves end-to-end latency speedups over the state-of-the-art Orion of up to 228.83x on GPU and 226.06x on CPU for LeNet on MNIST, and up to 4.55x on GPU and 9.43x on CPU for MobileNet on ImageNet. FEnc2 is hardware-agnostic yet architecturally transformative: by optimizing encrypted tensor layout before execution, it reduces ciphertext count and workload pressure on hardware, complementing primitive-level optimizations such as NTT and keyswitch accelerators. These results show that application-level data layout is a first-order architectural design dimension for encrypted inference and an important enabler for next-generation FHE systems.
翻译:全同态加密(FHE)可实现隐私保护机器学习,但会带来极高的计算和内存开销。这些成本不仅源于昂贵的底层原语(包括数论变换NTT、旋转和密钥交换),还源于应用层面的低效密文打包。现有打包策略通常只保存相邻数据元素或特征分组中的一种,而非兼顾两者,导致密文槽位浪费、旋转操作过多以及密文数量激增。本文提出FEnc$^2$,一个面向CKKS隐私卷积神经网络推理的统一且基于原则的片段编码框架。FEnc$^2$通过两个组件优化槽位利用率、旋转复杂度和密文密度:1)卷积感知编码,通过分析选择最优片段大小来解耦空间依赖关系,并联合最小化各层的内外旋转;2)架构感知的密文压缩,在特征或通道缩减层后恢复密文密度。这些变换共同重塑了加密工作负载结构,将同态运算量降低一至两个数量级。在充分利用内存容量(即最大批量)的情况下,FEnc$^2$在MNIST数据集上的LeNet模型实现了较最先进Orion方法最高228.83倍(GPU)和226.06倍(CPU)的端到端延迟加速,在ImageNet上的MobileNet模型实现了最高4.55倍(GPU)和9.43倍(CPU)的加速。FEnc$^2$不依赖特定硬件,但具有架构变革性:通过在执行前优化加密张量布局,它减少了密文数量和硬件上的工作负载压力,与底层原语优化(如NTT和密钥交换加速器)形成互补。这些结果表明,应用层面的数据布局是加密推理的一级架构设计维度,也是下一代FHE系统的重要推动因素。