Faster Relative Entropy Coding with Greedy Rejection Coding

Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$ using as few bits as possible. Unlike entropy coding, REC does not assume discrete distributions or require quantisation. As such, it can be naturally integrated into communication pipelines such as learnt compression and differentially private federated learning. Unfortunately, despite their practical benefits, REC algorithms have not seen widespread application, due to their prohibitively slow runtimes or restrictive assumptions. In this paper, we make progress towards addressing these issues. We introduce Greedy Rejection Coding (GRC), which generalises the rejection based-algorithm of Harsha et al. (2007) to arbitrary probability spaces and partitioning schemes. We first show that GRC terminates almost surely and returns unbiased samples from $Q$, after which we focus on two of its variants: GRCS and GRCD. We show that for continuous $Q$ and $P$ over $\mathbb{R}$ with unimodal density ratio $dQ/dP$, the expected runtime of GRCS is upper bounded by $\beta D_{KL}[Q || P] + O(1)$ where $\beta \approx 4.82$, and its expected codelength is optimal. This makes GRCS the first REC algorithm with guaranteed optimal runtime for this class of distributions, up to the multiplicative constant $\beta$. This significantly improves upon the previous state-of-the-art method, A* coding (Flamich et al., 2022). Under the same assumptions, we experimentally observe and conjecture that the expected runtime and codelength of GRCD are upper bounded by $D_{KL}[Q || P] + O(1)$. Finally, we evaluate GRC in a variational autoencoder-based compression pipeline on MNIST, and show that a modified ELBO and an index-compression method can further improve compression efficiency.

翻译：相对熵编码（REC）算法使用提议分布$P$对目标分布$Q$的样本进行编码，并尽可能少地使用比特数。与熵编码不同，REC既不假设离散分布，也不要求量化处理。因此，它能够自然地集成到学习压缩和差分隐私联邦学习等通信管线中。遗憾的是，尽管具有实用优势，REC算法因运行时间过长或假设条件过于严苛而尚未得到广泛应用。本文旨在推动解决这些问题。我们提出贪心拒绝编码（GRC），将Harsha等人（2007）的基于拒绝采样的算法推广到任意概率空间和分区方案。首先证明GRC几乎必然终止，并能从$Q$中返回无偏样本，随后重点研究其两种变体：GRCS和GRCD。我们证明，对于定义于$\mathbb{R}$上且具有单峰密度比$dQ/dP$的连续分布$Q$和$P$，GRCS的期望运行时间上界为$\beta D_{KL}[Q || P] + O(1)$（其中$\beta \approx 4.82$），且其期望码长达到最优。这使得GRCS成为首个对此类分布保证最优运行时间的REC算法（仅差乘法常数$\beta$），显著优于现有最先进方法A*编码（Flamich等人，2022）。在相同假设下，我们通过实验观察并推测GRCD的期望运行时间和码长上界为$D_{KL}[Q || P] + O(1)$。最后，我们在基于变分自编码器的MNIST压缩管线中评估GRC，结果表明改进的ELBO和索引压缩方法可进一步提升压缩效率。

相关内容

相对熵

关注 0

相对熵（relative entropy），又被称为Kullback-Leibler散度（Kullback-Leibler divergence）或信息散度（information divergence），是两个概率分布（probability distribution）间差异的非对称性度量。在在信息理论中，相对熵等价于两个概率分布的信息熵（Shannon entropy）的差值.

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日