Characterizing the Distinguishability of Product Distributions through Multicalibration

Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

翻译：给定一个样本序列 $x_1, \dots , x_k$，已知其从两个分布 $X_0$ 或 $X_1$ 中抽取，统计学中一个深入研究的问题是判断这些样本究竟来自 $\textit{哪个}$ 分布。从信息论角度看，给定 $k$ 个样本时区分这两个分布的最大优势由 $X_0^{\otimes k}$ 与 $X_1^{\otimes k}$ 之间的总变差距离所刻画。然而，当我们将注意力限制在这两个分布的 $\textit{高效区分器}$（即小型电路）上时，精确刻画区分 $X_0^{\otimes k}$ 和 $X_1^{\otimes k}$ 的能力则更为复杂且研究较少。在本工作中，我们提出了一种通用方法，将 $X_0$ 与 $X_1$ 的 $\textit{计算不可区分性}$ 的界约化为某些特定相关变量 $\widetilde{X}_0$ 和 $\widetilde{X}_1$ 的 $\textit{信息论不可区分性}$ 的界。作为推论，我们证明了一个新的紧致刻画：以恒定优势高效区分 $X_0^{\otimes k}$ 和 $X_1^{\otimes k}$ 所需样本数 $k$ 满足 \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] 其中 $d_H$ 是两个分布 $\widetilde{X}_0$ 与 $\widetilde{X}_1$ 之间的 Hellinger 距离的平方的倒数，而这两个分布在计算上与 $X_0$ 和 $X_1$ 不可区分。同样地，我们的框架可用于重新推导 Halevi 和 Rabin（TCC 2008）以及 Geier（TCC 2022）的结果，证明对于任意乘积分布，计算不可区分性随样本数量变化的近乎紧致的界。