To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.
翻译:为从统计学中获取有价值的洞见,机器学习应用经常需要分析海量数据。本研究致力于设计高效的隐私保护技术,以探查允许研究人员在保护其模型隐私的前提下,针对患者记录特定属性开展大规模医学研究的大型数据集。我们引入了一套可组合的全同态运算,并展示了如何通过近似全同态加密将私有函数求值与私有阈值判定相结合。基于此,我们设计了名为TETRIS的新系统,该系统解决了大型数据库私有函数探索这一实际应用场景,其中统计标准对持有患者记录的服务器保持私密性。实验表明,即使对于由线性和非线性函数构成的复杂语句求值,TETRIS在大型患者数据集上仍能实现实用性能。在单线程环境下,仅需数分钟即可从数十万条患者记录数据库中提取私有洞见,每条数据库记录的摊销时间小于2毫秒。