Despite the potential of differentially private data visualization to harmonize data analysis and privacy, research in this area remains relatively underdeveloped. Boxplots are a widely popular visualization used for summarizing a dataset and for comparison of multiple datasets. Consequentially, we introduce a differentially private boxplot. We evaluate its effectiveness for displaying location, scale, skewness and tails of a given empirical distribution. In our theoretical exposition, we show that the location and scale of the boxplot are estimated with optimal sample complexity, and the skewness and tails are estimated consistently. In simulations, we show that this boxplot performs similarly to a non-private boxplot, and it outperforms a boxplot naively constructed from existing differentially private quantile algorithms. Additionally, we conduct a real data analysis of Airbnb listings, which shows that comparable analysis can be achieved through differentially private boxplot visualization.
翻译:尽管差分隐私数据可视化在协调数据分析与隐私保护方面具有潜力,但该领域的研究仍相对不足。箱线图是一种广泛使用的可视化工具,用于汇总数据集及比较多个数据集。因此,我们提出了一种差分隐私箱线图。我们评估了其在展示给定经验分布的位置、尺度、偏度和尾部特征方面的有效性。在理论分析中,我们证明了箱线图的位置和尺度估计具有最优样本复杂度,且偏度和尾部估计具有一致性。通过模拟实验,我们发现该箱线图的性能与非隐私箱线图相近,并且优于直接基于现有差分隐私分位数算法构建的朴素箱线图。此外,我们对Airbnb房源数据进行了实际分析,结果表明通过差分隐私箱线图可视化可以实现可比的分析效果。