Total variation distance (TV distance) is an important measure for the difference between two distributions. Recently, there has been progress in approximating the TV distance between product distributions: a deterministic algorithm for a restricted class of product distributions (Bhattacharyya, Gayen, Meel, Myrisiotis, Pavan and Vinodchandran 2023) and a randomized algorithm for general product distributions (Feng, Guo, Jerrum and Wang 2023). We give a deterministic fully polynomial-time approximation algorithm (FPTAS) for the TV distance between product distributions. Given two product distributions $\mathbb{P}$ and $\mathbb{Q}$ over $[q]^n$, our algorithm approximates their TV distance with relative error $\varepsilon$ in time $O\bigl( \frac{qn^2}{\varepsilon} \log q \log \frac{n}{\varepsilon \Delta_{\text{TV}}(\mathbb{P},\mathbb{Q}) } \bigr)$. Our algorithm is built around two key concepts: 1) The likelihood ratio as a distribution, which captures sufficient information to compute the TV distance. 2) We introduce a metric between likelihood ratio distributions, called the minimum total variation distance. Our algorithm computes a sparsified likelihood ratio distribution that is close to the original one w.r.t. the new metric. The approximated TV distance can be computed from the sparsified likelihood ratio. Our technique also implies deterministic FPTAS for the TV distance between Markov chains.
翻译:总变差距离(TV距离)是衡量两个分布差异的重要指标。近年来,在近似乘积分布之间的TV距离方面取得了进展:一类受限乘积分布的确定性算法(Bhattacharyya, Gayen, Meel, Myrisiotis, Pavan 和 Vinodchandran,2023年)以及一般乘积分布的随机化算法(Feng, Guo, Jerrum 和 Wang,2023年)。我们提出了一种用于乘积分布之间TV距离的确定性完全多项式时间近似算法(FPTAS)。给定两个定义在$[q]^n$上的乘积分布$\mathbb{P}$和$\mathbb{Q}$,我们的算法在时间$O\bigl( \frac{qn^2}{\varepsilon} \log q \log \frac{n}{\varepsilon \Delta_{\text{TV}}(\mathbb{P},\mathbb{Q}) } \bigr)$内以相对误差$\varepsilon$近似其TV距离。该算法建立在两个关键概念之上:1)作为分布的似然比,它捕获了计算TV距离所需的充分信息。2)我们引入了似然比分布之间的一种度量,称为最小总变差距离。我们的算法计算一个稀疏化的似然比分布,该分布在新度量下接近原始分布。近似的TV距离可以从稀疏化的似然比中计算得出。我们的技术还暗示了马尔可夫链之间TV距离的确定性FPTAS。