A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.
翻译:管理分布式敏感数据的一个关键任务是度量分布变化的程度。理解这种漂移能有效支持多种联邦学习与分析任务。然而,在许多实际场景中,共享此类信息可能不可取(例如出于隐私考虑)或不可行(例如通信成本过高)。本研究提出在差分隐私框架下,估计联邦计算模型中数据KL散度的新型算法方法。我们分析了其理论性质,并通过实证研究评估其性能。我们探索了优化算法精度的参数设置,针对不同场景生成适用于实际任务的子变体,以满足不同上下文和应用特定的信任级别要求。实验结果表明,我们的隐私估计器达到了与无差分隐私保证的基线算法相当的精度。