This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used \v{C}ech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persistence diagrams of \v{C}ech complexes to be privatized. As an alternative, we show that the persistence diagram obtained by the $L^1$-distance to measure (DTM) has sensitivity $O(1/n)$. Based on the sensitivity analysis, we propose using the exponential mechanism whose utility function is defined in terms of the bottleneck distance of the $L^1$-DTM persistence diagrams. We also derive upper and lower bounds of the accuracy of our privacy mechanism; the obtained bounds indicate that the privacy error of our mechanism is near-optimal. We demonstrate the performance of our privatized persistence diagrams through simulations as well as on a real dataset tracking human movement.
翻译:本文首次尝试进行差分私有拓扑数据分析,生成接近最优的私有持久性图。我们从瓶颈距离的角度分析了持久性图的敏感性,并证明了常用的Čech复形的敏感性不会随着样本量$n$的增加而降低,这使得Čech复形的持久性图难以实现私有化。作为替代方案,我们证明通过$L^1$距离度量(DTM)获得的持久性图具有$O(1/n)$的敏感性。基于敏感性分析,我们提出使用指数机制,其效用函数由$L^1$-DTM持久性图的瓶颈距离定义。我们还推导了隐私机制准确性的上下界,所得界表明该机制的隐私误差接近最优。通过模拟实验及真实人体运动追踪数据集,我们展示了私有化持久性图的性能。