An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.
翻译:向量 \(p\in \mathbb{R}^n\) 的无偏 \(m\)-稀疏化是一个随机向量 \(Q\in \mathbb{R}^n\),其均值等于 \(p\),且至多有 \(m<n\) 个非零坐标。无偏稀疏化在压缩原始向量的同时避免了引入偏差,广泛应用于联邦学习与稀疏概率分布采样等场景。理想情况下,无偏稀疏化还应最小化衡量 \(Q\) 与原始向量 \(p\) 偏离程度的散度函数 \(\mathsf{Div}(Q,p)\) 的期望值。若 \(Q\) 在该意义下达到最优,则称之为高效的。本文的主要结果描述了针对置换不变或可加可分离散度的高效无偏稀疏化方案。值得注意的是,对于置换不变散度而言,其特征刻画对散度函数的选择具有鲁棒性——我们针对平方欧氏距离所得的最优 \(Q\) 类与针对KL散度或诸多其他散度所得的最优 \(Q\) 类完全一致。