Data privacy is a core tenet of responsible computing, and in the United States, differential privacy (DP) is the dominant technical operationalization of privacy-preserving data analysis. With this study, we qualitatively examine one class of DP mechanisms: private data synthesizers. To that end, we conducted semi-structured interviews with data experts: academics and practitioners who regularly work with data. Broadly, our findings suggest that quantitative DP benchmarks must be grounded in practitioner needs, while communication challenges persist. Participants expressed a need for context-aware DP solutions, focusing on parity between research outcomes on real and synthetic data. Our analysis led to three recommendations: (1) improve existing insufficient sanitized benchmarks; successful DP implementations require well-documented, partner-vetted use cases, (2) organizations using DP synthetic data should publish discipline-specific standards of evidence, and (3) tiered data access models could allow researchers to gradually access sensitive data based on demonstrated competence with high-privacy, low-fidelity synthetic data.
翻译:数据隐私是负责任计算的核心原则,在美国,差分隐私(DP)是实现隐私保护数据分析的主流技术方案。本研究通过定性方法考察了一类DP机制:隐私数据合成器。为此,我们对数据专家(即长期从事数据工作的学者与实践者)进行了半结构化访谈。总体而言,研究发现定量DP基准必须立足于实践者需求,而沟通挑战依然存在。参与者表达了对情境感知DP解决方案的需求,重点关注真实数据与合成数据研究成果的等效性。我们的分析得出三项建议:(1)改进现有不充分的脱敏基准;成功的DP实施需要经过合作伙伴审查且文档完备的用例;(2)使用DP合成数据的机构应发布学科特定的证据标准;(3)分级数据访问模型可使研究人员基于其在高隐私、低保真合成数据上展现的能力,逐步获取敏感数据。