Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under item-level (where each user contributes one example) and user-level (where each user contributes multiple examples) privacy and derive several improved upper bounds. For item-level privacy, our algorithm achieves a near optimal bound for general concept classes. We extend this to the user-level setting, rendering a tighter upper bound than the one proved by Ghazi et al. (2023). Lastly, we consider the problem of learning thresholds under user-level privacy and present an algorithm with a nearly tight user complexity.
翻译:机器学习已在众多领域取得显著进展。在许多场景中,学习过程涉及包含敏感信息的数据集,此时隐私保护对学习算法至关重要。本研究探讨不可知模型中的纯私有学习——该框架反映了实际学习过程。我们分析了在项级隐私(每个用户贡献一个样本)和用户级隐私(每个用户贡献多个样本)条件下所需的用户数量,并推导出若干改进的上界。对于项级隐私,我们的算法对一般概念类实现了近乎最优的边界。我们将此扩展至用户级场景,得到了比Ghazi等人(2023)证明的更紧致上界。最后,我们研究了用户级隐私下的阈值学习问题,并提出了一种具有近乎紧致用户复杂度的算法。