Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.
翻译:差分隐私(DP)的实现极易出错,微妙的漏洞常常使其理论保证失效。现有的验证方法往往不切实际:形式化工具限制过多,而黑盒统计审计对于复杂流程难以处理,且无法精确定位漏洞来源。本文提出了Re:cord-play,一种检查DP算法内部状态的灰盒审计范式。通过在具有相同随机性的相邻数据集上运行经过插装的算法,Re:cord-play直接检查数据依赖的控制流,并通过比较声明的敏感度与内部输入之间经验测量的距离,为敏感度违规提供具体的证伪依据。我们将其推广为Re:cord-play-sample,这是一种完整的统计审计方法,能够隔离并测试每个组件(包括不可信组件)。我们通过审计12个开源库(包括SmartNoise SDK、Opacus和Diffprivlib)并发现13个影响其理论保证的隐私违规,证明了我们新颖的测试方法既有效又必要。我们将该框架作为开源Python包发布,从而使DP开发者能够轻松地将有效、计算成本低且无缝的隐私测试集成到其软件开发生命周期中。