Coupling without Communication and Drafter-Invariant Speculative Decoding

Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to generate a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with has as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $Pr[a = b] = 1 - D_{TV}(P,Q)$, where $D_{TV}(P,Q)$ is the total variation distance. What if Alice and Bob must solve this same problem without communicating at all? Perhaps surprisingly, with access to public randomness, they can still achieve $Pr[a = b] \geq \frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$. In fact, this bound can be obtained using a simple protocol based on the Weighted MinHash algorithm. In this work, we explore the communication-free coupling in greater depth. First, we show that an equally simple protocol based on Gumbel sampling matches the worst-case guarantees of the Weighted MinHash approach, but tends to perform better in practice. Conversely, we prove that both approaches are actually sharp: no communication-free protocol can achieve $Pr[a=b]>\frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)}$ in the worst-case. Finally, we prove that, for distributions over $n$ items, there exists a scheme that uses just $O(\log(n/\epsilon))$ bits of communication to achieve $Pr[a = b] = 1 - D_{TV}(P,Q) - \epsilon$, i.e. to essentially match optimal coupling. Beyond our theoretical results, we demonstrate an application of communication-free coupling to speculative decoding, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding, which has the desirable property that the output of the method is fixed given a fixed random seed, regardless of what drafter is used for speculation.

翻译：假设Alice拥有分布$P$，Bob拥有分布$Q$。Alice希望生成样本$a\sim P$，Bob希望生成样本$b \sim Q$，并使得$a = b$的概率尽可能高。众所周知，通过对分布间的最优耦合进行采样，Alice和Bob可以实现$Pr[a = b] = 1 - D_{TV}(P,Q)$，其中$D_{TV}(P,Q)$为总变差距离。如果Alice和Bob必须在完全不通信的情况下解决相同的问题呢？或许令人惊讶的是，借助公共随机性，他们仍然可以实现$Pr[a = b] \geq \frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$。实际上，这一界限可以通过一种基于加权最小哈希算法的简单协议获得。在本工作中，我们更深入地探讨了无需通信的耦合问题。首先，我们证明了一种基于Gumbel采样的同等简单的协议，其最坏情况保证与加权最小哈希方法相匹配，但在实践中往往表现更优。相反，我们证明这两种方法实际上都是尖锐的：在最坏情况下，任何无需通信的协议都无法实现$Pr[a=b]>\frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)}$。最后，我们证明对于$n$个项上的分布，存在一种方案仅使用$O(\log(n/\epsilon))$比特的通信即可实现$Pr[a = b] = 1 - D_{TV}(P,Q) - \epsilon$，从而在本质上匹配最优耦合。除了理论结果外，我们展示了无需通信的耦合在推测解码中的应用，这是一种用于加速自回归大语言模型的新方法[Leviathan, Kalman, Matias, ICML 2023]。我们证明，无需通信的协议产生了一种我们称之为“起草者无关的推测解码”的推测解码变体，该变体具有一个理想特性：在给定固定随机种子的情况下，无论使用何种起草模型进行推测，该方法的输出都是确定的。