We study distributed goodness-of-fit testing for discrete distribution under bandwidth and differential privacy constraints. Information constraint distributed goodness-of-fit testing is a problem that has received considerable attention recently. The important case of discrete distributions is theoretically well understood in the classical case where all data is available in one "central" location. In a federated setting, however, data is distributed across multiple "locations" (e.g. servers) and cannot readily be shared due to e.g. bandwidth or privacy constraints that each server needs to satisfy. We show how recently derived results for goodness-of-fit testing for the mean of a multivariate Gaussian model extend to the discrete distributions, by leveraging Le Cam's theory of statistical equivalence. In doing so, we derive matching minimax upper- and lower-bounds for the goodness-of-fit testing for discrete distributions under bandwidth or privacy constraints in the regime where the number of samples held locally is large.
翻译:我们研究了带宽和差分隐私约束下的离散分布分布式拟合优度检验。信息约束分布式拟合优度检验是近期备受关注的研究课题。在传统场景中,当所有数据集中于单一"中心"位置时,离散分布的理论框架已较为完善。然而在联邦学习环境中,数据分散存储于多个"节点"(如服务器),且受限于各节点需满足的带宽或隐私约束而无法直接共享。通过运用Le Cam统计等价理论,我们证明了多元高斯模型均值拟合优度检验的最新研究成果可扩展至离散分布场景。基于此,我们在本地样本量充足的机制下,推导出带宽或隐私约束条件下离散分布拟合优度检验的极小极大上下界匹配结果。