CECILIA: Comprehensive Secure Machine Learning Framework

from arxiv, Preprint version of "A privacy-preserving approach for cloud-based protein fold recognition" paper published in Patterns, ~8 pages of the main paper, ~5 pages of Supplement

Since ML algorithms have proven their success in many different applications, there is also a big interest in privacy preserving (PP) ML methods for building models on sensitive data. Moreover, the increase in the number of data sources and the high computational power required by those algorithms force individuals to outsource the training and/or the inference of a ML model to the clouds providing such services. To address this, we propose a secure 3-party computation framework, CECILIA, offering PP building blocks to enable complex operations privately. In addition to the adapted and common operations like addition and multiplication, it offers multiplexer, most significant bit and modulus conversion. The first two are novel in terms of methodology and the last one is novel in terms of both functionality and methodology. CECILIA also has two complex novel methods, which are the exact exponential of a public base raised to the power of a secret value and the inverse square root of a secret Gram matrix. We use CECILIA to realize the private inference on pre-trained RKNs, which require more complex operations than most other DNNs, on the structural classification of proteins as the first study ever accomplishing the PP inference on RKNs. In addition to the successful private computation of basic building blocks, the results demonstrate that we perform the exact and fully private exponential computation, which is done by approximation in the literature so far. Moreover, they also show that we compute the exact inverse square root of a secret Gram matrix up to a certain privacy level, which has not been addressed in the literature at all. We also analyze the scalability of CECILIA to various settings on a synthetic dataset. The framework shows a great promise to make other ML algorithms as well as further computations privately computable by the building blocks of the framework.

翻译：随着机器学习算法在许多不同应用中证明了其成功，针对敏感数据构建模型的隐私保护机器学习方法也引起了广泛关注。此外，数据源数量的增加以及这些算法所需的高计算能力，迫使个人将机器学习模型的训练和/或推理外包给提供此类服务的云平台。为此，我们提出了一个安全的三方计算框架CECILIA，它提供隐私保护基础构件，以实现复杂的隐私计算操作。除了经过适配的常见操作（如加法和乘法）外，它还提供了多路复用器、最高有效位和模数转换功能。前两者在方法学上是新颖的，而后者在功能和方法学上均具有新颖性。CECILIA还包含两种复杂的新方法：公开底数以秘密值为指数的精确指数计算，以及秘密格拉姆矩阵的精确逆平方根计算。我们利用CECILIA实现了对预训练RKNs的隐私推理——这是首个完成RKNs隐私保护推理的研究，应用于蛋白质结构分类任务，该任务比大多数其他深度神经网络需要更复杂的操作。除了成功实现基础构件的隐私计算外，结果表明我们实现了精确且完全隐私的指数计算（迄今为止文献中均采用近似方法）。此外，结果还表明我们在特定隐私级别下计算了秘密格拉姆矩阵的精确逆平方根，这在该领域文献中尚未被涉及。我们还在合成数据集上分析了CECILIA在不同设置下的可扩展性。该框架展现出巨大潜力，能够通过其基础构件使其他机器学习算法及更广泛的计算任务实现隐私可计算性。