SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
翻译:SQL是探索性数据分析的事实标准接口;然而,直接发布精确查询结果可能通过成员推断或属性推断攻击泄露敏感信息。差分隐私(DP)提供了严格的隐私保证,但在实际应用中,仅靠DP可能无法满足治理要求,例如**最小频次规则**(要求每个发布的分组(单元格)必须包含至少$k$个不同个体的贡献)。本文提出了**DPSQL+**,一种隐私保护的SQL库,它同时强制执行用户级$(\varepsilon,δ)$-差分隐私和最小频次规则。DPSQL+采用模块化架构,包含:(i)**验证器**,用于静态限制查询为DP安全的SQL子集;(ii)**审计器**,用于跨多个查询一致追踪累积隐私损失;以及(iii)**后端**,用于与多种数据库引擎交互,确保可移植性和可扩展性。在TPC-H基准测试上的实验表明,DPSQL+能在广泛的分析工作负载(从基础聚合到二次统计和连接操作)中实现实用精度,并且在固定全局隐私预算下,允许的查询数量显著多于我们评估中的先前库。