SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
翻译:SQL是探索性数据分析的事实标准接口;然而,发布精确查询结果可能通过成员推断或属性推断攻击暴露敏感信息。差分隐私(DP)提供了严格的隐私保证,但在实践中,仅依赖DP可能无法满足治理要求,例如要求每个发布的分组(单元格)必须包含至少$k$个独立个体贡献的\emph{最小频次规则}。本文提出\textbf{DPSQL+},一种同时强制执行用户级$(\varepsilon,\delta)$-DP与最小频次规则的隐私保护SQL库。DPSQL+采用模块化架构,包含:(i)\emph{验证器},静态地将查询限制在DP安全的SQL子集;(ii)\emph{会计器},持续追踪跨多查询的累积隐私损失;(iii)\emph{后端引擎},与多种数据库引擎对接,确保可移植性与可扩展性。基于TPC-H基准的实验表明,DPSQL+在广泛的分析负载(从基础聚合到二次统计及连接操作)中均达到实用精度,且在固定的全局隐私预算下,相较于评估中的现有库,可支持显著更多的查询。