Despite the potential of differentially private data visualization to harmonize data analysis and privacy, research in this area remains relatively underdeveloped. Boxplots are a widely popular visualization used for summarizing a dataset and for comparison of multiple datasets. Consequentially, we introduce a differentially private boxplot. We evaluate its effectiveness for displaying location, scale, skewness and tails of a given empirical distribution. In our theoretical exposition, we show that the location and scale of the boxplot are estimated with optimal sample complexity, and the skewness and tails are estimated consistently. In simulations, we show that this boxplot performs similarly to a non-private boxplot, and it outperforms a boxplot naively constructed from existing differentially private quantile algorithms. Additionally, we conduct a real data analysis of Airbnb listings, which shows that comparable analysis can be achieved through differentially private boxplot visualization.
翻译:尽管差分隐私数据可视化在协调数据分析与隐私保护方面具有潜力,但该领域的研究仍相对不足。箱线图是一种广泛使用的可视化方法,用于汇总数据集并比较多个数据集。因此,我们提出了一种差分隐私箱线图。我们评估了其在展示给定经验分布的位置、尺度、偏度和尾部特征方面的有效性。在理论阐述中,我们证明了箱线图的位置和尺度估计具有最优样本复杂度,且偏度和尾部估计具有一致性。在模拟实验中,我们表明该箱线图的性能与非隐私箱线图相似,并且优于单纯基于现有差分隐私分位数算法构建的箱线图。此外,我们对Airbnb房源数据进行了实际分析,结果表明通过差分隐私箱线图可视化可以实现可比的分析效果。