Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.
翻译:形式化隐私度量提供合规导向的保证,但往往无法量化已发布数据集中的实际可链接性。本文提出CVPL(聚类-向量-投影链接),一种用于事后评估原始表格数据与受保护表格数据间链接风险的几何框架。CVPL将链接分析表示为包含分块、向量化、潜在投影和相似度评估的算子流水线,产生连续、场景依赖的风险估计,而非二元合规判定。我们在显式威胁模型下形式化定义CVPL,并引入阈值感知风险曲面R(λ, τ),以捕捉保护强度与攻击者严格程度的联合效应。我们建立了具有单调性保证的渐进分块策略,支持带有效下界的随时风险估计。我们证明经典Fellegi-Sunter链接是CVPL在严格假设下的特例,且违反这些假设可能导致系统性过度链接偏差。在19种保护配置下对10,000条记录进行的实证验证表明,形式化的k-匿名合规可能与显著的实证可链接性共存,其中相当部分源于非准标识符的行为模式。CVPL提供可解释的诊断,识别哪些特征驱动链接可行性,支持隐私影响评估、保护机制比较以及效用-风险权衡分析。