Privately counting distinct elements in a stream is a fundamental data analysis problem with many applications in machine learning. In the turnstile model, Jain et al. [NeurIPS2023] initiated the study of this problem parameterized by the maximum flippancy of any element, i.e., the number of times that the count of an element changes from 0 to above 0 or vice versa. They give an item-level $(\epsilon,\delta)$-differentially private algorithm whose additive error is tight with respect to that parameterization. In this work, we show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level $(\epsilon,\delta)$-differential privacy and item-level $\epsilon$-differential privacy with regards to a different parameterization, namely the sum of all flippancies. Our second result is a bound which shows that for a large class of algorithms, including all existing differentially private algorithms for this problem, the lower bound from item-level differential privacy extends to event-level differential privacy. This partially answers an open question by Jain et al. [NeurIPS2023].
翻译:在数据流中私有化地统计独立元素是一个基础的数据分析问题,在机器学习领域具有广泛的应用。在翻转模型中,Jain等人[NeurIPS2023]开创性地基于元素的最大翻转度(即任一元素的计数从0变为非0或从非0变为0的最大次数)对该问题进行了参数化研究。他们提出了一种项目级$(\epsilon,\delta)$-差分隐私算法,其加性误差相对于该参数化达到紧界。本文中,我们证明基于稀疏向量技术的简单算法能够在项目级$(\epsilon,\delta)$-差分隐私和项目级$\epsilon$-差分隐私下,针对不同的参数化(即所有翻转度的总和)实现紧加性误差。我们的第二个结果是一个下界证明:对于包括该问题所有现有差分隐私算法在内的一大类算法,项目级差分隐私的下界可以推广到事件级差分隐私。这在一定程度上回答了Jain等人[NeurIPS2023]提出的开放性问题。