An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.
翻译:向量 $p\in \mathbb{R}^n$ 的无偏 $m$-稀疏化是指一个随机向量 $Q\in \mathbb{R}^n$,其均值为 $p$,且至多有 $m<n$ 个非零坐标。无偏稀疏化能在不引入偏差的情况下压缩原始向量;它出现在多种场景中,例如联邦学习和稀疏概率分布采样。理想情况下,无偏稀疏化还应最小化散度函数 $\mathsf{Div}(Q,p)$ 的期望值,该函数用于度量 $Q$ 与原始 $p$ 的偏离程度。若 $Q$ 在此意义下是最优的,则我们称其为高效的。我们的主要结果描述了针对置换不变或加性可分离散度的高效无偏稀疏化。令人惊讶的是,置换不变散度的表征对于散度函数的选择具有鲁棒性,即我们针对平方欧氏距离得到的最优 $Q$ 类,与针对 Kullback-Leibler 散度或实际上任何一类广泛散度得到的最优 $Q$ 类是一致的。