Homomorphic encryption (HE) enables arithmetic operations to be performed directly on encrypted data. It is essential for privacy-preserving applications such as machine learning, medical diagnosis, and financial data analysis. In popular HE schemes, ciphertext multiplication is only defined for two inputs. However, the multiplication of multiple inputs is needed in many HE applications. In our previous work, a three-input ciphertext multiplication method for the CKKS HE scheme was developed. This paper first reformulates the three-input ciphertext multiplication to enable the combination of computations in order to further reduce the complexity. The second contribution is extending the multiplication to multiple inputs without compromising the noise overhead. Additional evaluation keys are introduced to achieve relinearization of polynomial multiplication results. To minimize the complexity of the large number of rescaling units in the multiplier, a theoretical analysis is developed to relocate the rescaling, and a multi-level rescaling approach is proposed to implement combined rescaling with complexity similar to that of a single rescaling unit. Guidelines and examples are provided on the input partition to enable the combination of more rescaling. Additionally, efficient hardware architectures are designed to implement our proposed multipliers. The improved three-input ciphertext multiplier reduces the logic area and latency by 15% and 50%, respectively, compared to the best prior design. For multipliers with more inputs, ranging from 4 to 12, the architectural analysis reveals 32% savings in area and 45% shorter latency, on average, compared to prior work.
翻译:同态加密(HE)允许直接在加密数据上执行算术运算,对于机器学习、医疗诊断和金融数据分析等隐私保护应用至关重要。在主流同态加密方案中,密文乘法仅针对两个输入定义,然而许多HE应用需要实现多输入乘法。我们先前的研究已针对CKKS同态加密方案开发了三输入密文乘法方法。本文首先重构三输入密文乘法以支持计算组合,从而进一步降低复杂度。第二个贡献是在不增加噪声开销的前提下将乘法扩展至多输入场景。通过引入额外的评估密钥实现多项式乘法结果的再线性化。为最小化乘法器中大量重缩放单元的复杂度,本文通过理论分析重新定位重缩放操作,并提出多级重缩放方法,其组合重缩放的复杂度与单重缩放单元相当。通过提供输入分割的指导原则与实例,支持更多重缩放操作的组合。此外,设计了高效硬件架构以实现我们提出的乘法器。改进后的三输入密文乘法器与现有最优设计相比,逻辑面积和延迟分别降低15%和50%。对于4至12输入乘法器的架构分析表明,与现有工作相比平均可节省32%的面积并缩短45%的延迟。