This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city, where the speeds of different buses are drawn in a potentially non-i.i.d. manner from an unknown distribution, and where the number of speed samples contributed by different buses is potentially different. We then apply our algorithms to a synthetic dataset, generated based on the ITMS data, having either a large number of users or a large number of samples per user. Here, we provide recommendations for the choices of parameters and algorithm subroutines that result in low estimation errors, while guaranteeing user-level privacy.
翻译:本文研究了交通数据集中速度值样本均值的私有发布问题。我们的核心贡献是开发了用户级差分隐私算法,该算法通过精心选择参数值,确保在保护隐私的同时,降低真实世界数据集上的估计误差。我们使用来自印度某城市的ITMS(智能交通管理系统)数据对算法进行了测试,其中不同公交车的速度值可能以非独立同分布的方式从未知分布中抽取,且不同公交车贡献的速度样本数量可能不同。随后,我们将算法应用于基于ITMS数据生成的合成数据集,该数据集具有大量用户或每个用户包含大量样本的特点。在此,我们针对能保证低估计误差的参数选择及算法子程序提供了建议,同时确保用户级隐私得到保护。