Tabular data sharing serves as a common method for data exchange. However, sharing sensitive information without adequate privacy protection can compromise individual privacy. Thus, ensuring privacy-preserving data sharing is crucial. Differential privacy (DP) is regarded as the gold standard in data privacy. Despite this, current DP methods tend to generate privacy-preserving tabular datasets that often suffer from limited practical utility due to heavy perturbation and disregard for the tables' utility dynamics. Besides, there has not been much research on selective attribute release, particularly in the context of controlled partially perturbed data sharing. This has significant implications for scenarios such as cross-agency data sharing in real-world situations. We introduce OptimShare: a utility-focused, multi-criteria solution designed to perturb input datasets selectively optimized for specific real-world applications. OptimShare combines the principles of differential privacy, fuzzy logic, and probability theory to establish an integrated tool for privacy-preserving data sharing. Empirical assessments confirm that OptimShare successfully strikes a balance between better data utility and robust privacy, effectively serving various real-world problem scenarios.
翻译:表格数据共享是数据交换的常见方式。然而,在缺乏充分隐私保护的情况下共享敏感信息可能损害个人隐私。因此,确保隐私保护的数据共享至关重要。差分隐私(DP)被视为数据隐私领域的黄金标准。尽管如此,当前的DP方法生成的隐私保护表格数据集往往因过度扰动和忽视表格实用性动态变化而缺乏实际应用价值。此外,关于选择性属性发布(尤其是在受控部分扰动数据共享背景下)的研究尚不充分,这对现实场景中跨机构数据共享等情形具有重要影响。我们提出OptimShare:一种以实用性为核心的多准则解决方案,旨在针对特定现实应用场景选择性优化输入数据集的扰动策略。OptimShare融合差分隐私、模糊逻辑与概率论原理,构建了集成的隐私保护数据共享工具。实证评估表明,OptimShare成功实现了数据利用效率与强健隐私保护之间的平衡,可有效应对多种现实问题场景。