This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city, where the speeds of different buses are drawn in a potentially non-i.i.d. manner from an unknown distribution, and where the number of speed samples contributed by different buses is potentially different. We then apply our algorithms to large synthetic datasets, generated based on the ITMS data. Here, we provide theoretical justification for the observed performance trends, and also provide recommendations for the choices of algorithm subroutines that result in low estimation errors. Finally, we characterize the best performance of pseudo-user creation-based algorithms on worst-case datasets via a minimax approach; this then gives rise to a novel procedure for the creation of pseudo-users, which optimizes the worst-case total estimation error. The algorithms discussed in the paper are readily applicable to general spatio-temporal IoT datasets for releasing a differentially private mean of a desired value.
翻译:本文研究了从交通数据集中私有化发布速度值样本均值的问题。我们的核心贡献是开发了用户级差分隐私算法,该算法通过精心选择的参数值,在确保隐私的同时,对真实数据集实现了较低的估计误差。我们在印度某城市的智能交通管理系统数据上测试了所提算法,其中不同公交车的速度样本可能以非独立同分布方式来自未知分布,且不同公交车贡献的速度样本数量可能不同。随后,我们将算法应用于基于智能交通管理系统数据生成的大型合成数据集,并为观察到的性能趋势提供了理论依据,同时针对如何选择算法子程序以降低估计误差给出了建议。最后,我们通过极小极大方法刻画了基于伪用户创建算法在最坏情况数据集上的最优性能,进而提出了一种新的伪用户创建流程,该流程可优化最坏情况下的总体估计误差。本文讨论的算法可广泛适用于各类时空物联网数据集,用于发布目标数值的差分隐私均值。