Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information

Learning high-dimensional distributions is a significant challenge in machine learning and statistics. Classical research has mostly concentrated on asymptotic analysis of such data under suitable assumptions. While existing works [Bhattacharyya et al.: SICOMP 2023, Daskalakis et al.: STOC 2021, Choo et al.: ALT 2024] focus on discrete distributions, the current work addresses the tree structure learning problem for Gaussian distributions, providing efficient algorithms with solid theoretical guarantees. This is crucial as real-world distributions are often continuous and differ from the discrete scenarios studied in prior works. In this work, we design a conditional mutual information tester for Gaussian random variables that can test whether two Gaussian random variables are independent, or their conditional mutual information is at least $\varepsilon$, for some parameter $\varepsilon \in (0,1)$ using $\mathcal{O}(\varepsilon^{-1})$ samples which we show to be near-optimal. In contrast, an additive estimation would require $\Omega(\varepsilon^{-2})$ samples. Our upper bound technique uses linear regression on a pair of suitably transformed random variables. Importantly, we show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information. As an application of such a mutual information tester, we give an efficient $\varepsilon$-approximate structure-learning algorithm for an $n$-variate Gaussian tree model that takes $\widetilde{\Theta}(n\varepsilon^{-1})$ samples which we again show to be near-optimal. In contrast, when the underlying Gaussian model is not known to be tree-structured, we show that $\widetilde{{{\Theta}}}(n^2\varepsilon^{-2})$ samples are necessary and sufficient to output an $\varepsilon$-approximate tree structure. We perform extensive experiments that corroborate our theoretical convergence bounds.

翻译：学习高维分布是机器学习和统计学中的一个重要挑战。经典研究主要集中于在适当假设下对此类数据进行渐近分析。现有工作[Bhattacharyya et al.: SICOMP 2023, Daskalakis et al.: STOC 2021, Choo et al.: ALT 2024]主要关注离散分布，而本研究则针对高斯分布的树结构学习问题，提供了具有坚实理论保证的高效算法。这一点至关重要，因为现实世界的分布通常是连续的，且与先前工作中研究的离散场景不同。在本工作中，我们设计了一种高斯随机变量的条件互信息检验器，能够以$\mathcal{O}(\varepsilon^{-1})$个样本（我们证明该样本量接近最优）检验两个高斯随机变量是否独立，或其条件互信息是否至少为$\varepsilon$（其中参数$\varepsilon \in (0,1)$）。相比之下，加法估计需要$\Omega(\varepsilon^{-2})$个样本。我们的上界技术使用了一对经过适当变换的随机变量上的线性回归。重要的是，我们证明了条件互信息的链式法则对于估计的（条件）互信息仍然成立。作为此类互信息检验器的应用，我们为$n$变量高斯树模型提出了一种高效的$\varepsilon$近似结构学习算法，该算法需要$\widetilde{\Theta}(n\varepsilon^{-1})$个样本，我们再次证明该样本量接近最优。相比之下，当基础高斯模型未知为树结构时，我们证明需要$\widetilde{{\Theta}}(n^2\varepsilon^{-2})$个样本才能输出$\varepsilon$近似的树结构。我们进行了大量实验，结果证实了我们的理论收敛界。