We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2),$$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.
翻译:我们研究基于删失数据的高斯混合模型学习问题。基于删失数据的统计学习是一个经典问题,具有众多实际应用场景,然而即便是高斯混合这类简单隐变量模型的有限样本理论保证仍尚属空白。形式化地,给定来自单变量高斯混合模型 $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2)$$ 的删失数据,即仅当样本落入集合 $S$ 内时才被观测到。学习目标在于估计权重 $w_i$ 与均值 $\mu_i$。我们提出一种算法,仅需 $\frac{1}{\varepsilon^{O(k)}}$ 个样本即可实现对权重 $w_i$ 与均值 $\mu_i$ 的 $\varepsilon$ 误差估计。