Characterizing the Distinguishability of Product Distributions through Multicalibration

Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

翻译：给定一个样本序列 $x_1, \dots , x_k$，已知其来自两个分布 $X_0$ 或 $X_1$ 之一，统计学中一个被深入研究的问题是判断这些样本究竟来自$\textit{哪个}$分布。从信息论角度看，给定 $k$ 个样本时区分这两个分布的最大优势由 $X_0^{\otimes k}$ 与 $X_1^{\otimes k}$ 之间的总变差距离所刻画。然而，当我们将注意力限制在对这两个分布的$\textit{高效区分器}$（即小型电路）时，精确刻画区分 $X_0^{\otimes k}$ 与 $X_1^{\otimes k}$ 的能力则更为复杂且尚未被充分理解。在本工作中，我们提出了一种通用方法，将 $X_0$ 与 $X_1$ 的计算不可区分性界约化为某些特定相关变量 $\widetilde{X}_0$ 与 $\widetilde{X}_1$ 的$\textit{信息论}$不可区分性界。由此，我们证明了关于以恒定优势高效区分 $X_0^{\otimes k}$ 与 $X_1^{\otimes k}$ 所需样本数 $k$ 的一个新的紧致刻画：\[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] 其中 $d_H$ 表示两个分布 $\widetilde{X}_0$ 与 $\widetilde{X}_1$ 之间的 Hellinger 距离平方，且这两个分布在计算上与 $X_0$ 和 $X_1$ 不可区分。同样地，我们的框架可用于重新推导 Geier（TCC 2022）的结果，从而为任意乘积分布下计算不可区分性随样本数量变化的规律提供了近乎紧致的界。