Anonymizing microdata requires balancing the reduction of disclosure risk with the preservation of data utility. Traditional evaluations often rely on single measures or two-dimensional risk-utility (R-U) maps, but real-world assessments involve multiple, often correlated, indicators of both risk and utility. Pairwise comparisons of these measures can be inefficient and incomplete. We therefore systematically compare six visualization approaches for simultaneous evaluation of multiple risk and utility measures: heatmaps, dot plots, composite scatterplots, parallel coordinate plots, radial profile charts, and PCA-based biplots. We introduce blockwise PCA for composite scatterplots and joint PCA for biplots that simultaneously reveal method performance and measure interrelationships. Through systematic identification of Pareto-optimal methods in all approaches, we demonstrate how multivariate visualization supports a more informed selection of anonymization methods.
翻译:匿名化微观数据需要在降低披露风险与保持数据效用之间取得平衡。传统评估方法通常依赖单一指标或二维风险-效用(R-U)映射,但实际评估涉及多个且常具相关性的风险与效用指标。对这些指标进行两两比较可能效率低下且不完整。因此,我们系统比较了六种同时评估多风险与效用指标的可视化方法:热力图、点图、复合散点图、平行坐标图、雷达剖面图以及基于主成分分析(PCA)的双标图。我们为复合散点图引入分块PCA,并为双标图提出联合PCA,以同时揭示方法性能与指标间关联性。通过在所有方法中系统识别帕累托最优方法,我们论证了多元可视化如何支持更明智的匿名化方法选择。