Topological Data Analysis (TDA) offers a suite of computational tools that provide quantified shape features in high dimensional data that can be used by modern statistical and predictive machine learning (ML) models. In particular, persistent homology (PH) takes in data (e.g., point clouds, images, time series) and derives compact representations of latent topological structures, known as persistence diagrams (PDs). Because PDs enjoy inherent noise tolerance, are interpretable and provide a solid basis for data analysis, and can be made compatible with the expansive set of well-established ML model architectures, PH has been widely adopted for model development including on sensitive data, such as genomic, cancer, sensor network, and financial data. Thus, TDA should be incorporated into secure end-to-end data analysis pipelines. In this paper, we take the first step to address this challenge and develop a version of the fundamental algorithm to compute PH on encrypted data using homomorphic encryption (HE).
翻译:拓扑数据分析(TDA)提供了一套计算工具,能够提取高维数据中可用于现代统计与预测机器学习模型的量化形状特征。特别地,持续同调(PH)接收数据(如点云、图像、时间序列)并推导出潜在拓扑结构的紧凑表示,即持续图(PD)。由于持续图具有固有的噪声容忍性、可解释性,能为数据分析提供坚实基础,并能与成熟的机器学习模型架构广泛兼容,PH已被广泛应用于模型开发,包括基因组、癌症、传感器网络和金融等敏感数据领域。因此,TDA应被整合到安全的端到端数据分析流程中。本文首次应对这一挑战,开发了基于同态加密(HE)在加密数据上计算持续同调的基础算法版本。