Anonymizing microdata requires balancing the reduction of disclosure risk with the preservation of data utility. Traditional evaluations often rely on single measures or two-dimensional risk-utility (R-U) maps, but real-world assessments involve multiple, often correlated, indicators of both risk and utility. Pairwise comparisons of these measures can be inefficient and incomplete. We therefore systematically compare six visualization approaches for simultaneous evaluation of multiple risk and utility measures: heatmaps, dot plots, composite scatterplots, parallel coordinate plots, radial profile charts, and PCA-based biplots. We introduce blockwise PCA for composite scatterplots and joint PCA for biplots that simultaneously reveal method performance and measure interrelationships. Through systematic identification of Pareto-optimal methods in all approaches, we demonstrate how multivariate visualization supports a more informed selection of anonymization methods.
翻译:匿名化微观数据需在降低披露风险与保持数据效用之间取得平衡。传统评估往往依赖单一指标或二维风险-效用图,但实际评估涉及多个(通常相互关联的)风险与效用指标。对这些指标进行两两比较既低效又不完整。为此,我们系统比较了六种用于同时评估多项风险与效用指标的可视化方法:热力图、散点图、复合散点图、平行坐标图、径向剖面图以及基于主成分分析的双标图。针对复合散点图引入分块主成分分析,针对双标图引入联合主成分分析,可同时揭示方法性能与指标间的相互关系。通过在所有方法中系统识别帕累托最优方案,我们证明了多元可视化如何支持更明智的匿名化方法选择。