We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ in the PAC learning setting. Classically, in the advice-free setting, $\tilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\tilde{\boldsymbol{\Sigma}}$ as advice, we show that $\tilde{O}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $\| \tilde{\boldsymbol{\Sigma}}^{-1/2} \boldsymbol{\Sigma} \tilde{\boldsymbol{\Sigma}}^{-1/2} - \boldsymbol{I_d} \|_1 \leq \varepsilon d^{1-\beta}$ (where $\|\cdot\|_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.
翻译:我们在学习增强算法的框架下重新审视分布学习问题。在此设定中,我们探讨了这样一种场景:一个概率分布作为关于真实未知分布的潜在不准确建议被提供。我们的目标是开发学习算法,使其样本复杂度随着建议质量的提升而降低,从而在建议足够准确时超越标准学习下界。具体而言,我们证明了在PAC学习设定中,对于学习多元高斯分布 $N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ 这一问题,这一目标是可实现的。经典的无建议设定中,以恒定概率学习 $d$ 维高斯分布至TV距离 $\varepsilon$ 需要 $\tilde{\Theta}(d^2/\varepsilon^2)$ 个样本,且该数量在最坏情况下是必要的。当我们额外获得一个参数 $\tilde{\boldsymbol{\Sigma}}$ 作为建议时,我们证明只要 $\| \tilde{\boldsymbol{\Sigma}}^{-1/2} \boldsymbol{\Sigma} \tilde{\boldsymbol{\Sigma}}^{-1/2} - \boldsymbol{I_d} \|_1 \leq \varepsilon d^{1-\beta}$(其中 $\|\cdot\|_1$ 表示逐元素的 $\ell_1$ 范数)对任意 $\beta > 0$ 成立,则 $\tilde{O}(d^{2-\beta}/\varepsilon^2)$ 个样本即已足够,从而相比无建议设定实现了多项式级的改进。