Much of the research in differential privacy has focused on offline applications with the assumption that all data is available at once. When these algorithms are applied in practice to streams where data is collected over time, this either violates the privacy guarantees or results in poor utility. We derive an algorithm for differentially private synthetic streaming data generation, especially curated towards spatial datasets. Furthermore, we provide a general framework for online selective counting among a collection of queries which forms a basis for many tasks such as query answering and synthetic data generation. The utility of our algorithm is verified on both real-world and simulated datasets.
翻译:摘要:现有差分隐私研究多聚焦于离线场景,假设所有数据可一次性获取。当将这些算法实际应用于随时间持续收集数据的流式场景时,要么违反隐私保证,要么导致效用低下。我们提出了一种面向流式差分隐私合成数据生成的专用算法,尤其针对空间数据集进行了优化。此外,我们构建了一个通用的在线选择性计数框架,该框架可支撑查询应答、合成数据生成等多项任务。基于真实数据集与模拟数据集的实验验证了本算法的效用。