We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include $L_p$ norms, $k$-support norms, top-$k$ norms, and the box norm as special cases. Although it may be possible to design and analyze a separate mechanism for each symmetric norm, we propose a general parametrizable framework that differentially privately releases a number of sufficient statistics from which the approximation of all symmetric norms can be simultaneously computed. Our framework partitions the coordinates of the underlying frequency vector into different levels based on their magnitude and releases approximate frequencies for the "heavy" coordinates in important levels and releases approximate level sizes for the "light" coordinates in important levels. Surprisingly, our mechanism allows for the release of an arbitrary number of symmetric norm approximations without any overhead or additional loss in privacy. Moreover, our mechanism permits $(1+\alpha)$-approximation to each of the symmetric norms and can be implemented using sublinear space in the streaming model for many regimes of the accuracy and privacy parameters.
翻译:我们研究如何在满足差分隐私约束下发布数据流上的统计摘要。特别地,本文聚焦于发布对称范数族——该族在输入数据流中的坐标符号翻转和坐标置换下保持不变,并涵盖$L_p$范数、$k$-支撑范数、top-$k$范数及盒范数作为特例。尽管可能为每个对称范数单独设计并分析相应机制,我们提出了一种通用参数化框架,该框架能以差分隐私方式发布若干充分统计量,使得所有对称范数的近似值可同时计算。我们的框架将底层频率向量的坐标按幅度划分为不同层级,对关键层级中的"重"坐标释放近似频率,并对关键层级中的"轻"坐标释放近似层级规模。令人惊讶的是,该机制可在无需任何额外开销或隐私损失的前提下,支持任意数量的对称范数近似值发布。此外,我们的机制允许对每个对称范数实现$(1+\alpha)$-近似,并且在精度和隐私参数的多种取值区间内,可在流式模型中用次线性空间实现。