This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used \v{C}ech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persistence diagrams of \v{C}ech complexes to be privatized. As an alternative, we show that the persistence diagram obtained by the $L^1$-distance to measure (DTM) has sensitivity $O(1/n)$. Based on the sensitivity analysis, we propose using the exponential mechanism whose utility function is defined in terms of the bottleneck distance of the $L^1$-DTM persistence diagrams. We also derive upper and lower bounds of the accuracy of our privacy mechanism; the obtained bounds indicate that the privacy error of our mechanism is near-optimal. We demonstrate the performance of our privatized persistence diagrams through simulations as well as on a real dataset tracking human movement.
翻译:本文首次尝试将差分隐私(DP)应用于拓扑数据分析(TDA),生成了近乎最优的私有持久性图。我们分析了持久性图在瓶颈距离意义下的敏感性,并证明常用的Čech复形的敏感性不随样本量n的增加而降低,这使得Čech复形的持久性图难以实现隐私保护。作为替代方案,我们显示了基于L¹距离测度(DTM)获得的持久性图具有O(1/n)的敏感性。基于敏感性分析,我们提出使用指数机制,其效用函数以L¹-DTM持久性图的瓶颈距离定义。我们还推导了隐私机制精度的上下界;所得界表明该机制的隐私误差近乎最优。通过仿真实验以及真实人体运动跟踪数据集,我们验证了私有化持久性图的性能。