We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.
翻译:我们展示了如何将任何在均匀分布下工作的PAC学习算法,以黑盒方式转化为能在任意未知分布$\mathcal{D}$下工作的算法。该转化的效率与$\mathcal{D}$的内在复杂度成比例:对于概率质量函数由深度为$d$的决策树计算的$\{\pm 1\}^n$上的分布,其运行时间为$\mathrm{poly}(n, (md)^d)$,其中$m$是原始算法的样本复杂度。对于单调分布,我们的转化仅使用来自$\mathcal{D}$的样本;对于一般分布,则使用子立方体条件样本。一个关键技术成分是算法——在给定上述对$\mathcal{D}$的访问权限后,该算法能生成$\mathcal{D}$的最优决策树分解:将$\mathcal{D}$近似为不相交子立方体上均匀分布的混合。借助该分解,我们在每个子立方体上运行均匀分布学习器,并通过决策树组合假设。这一算法化分解引理还产生了新的决策树分布学习算法,其运行时间相较先前最优结果实现指数级提升——这些结果在分布学习中具有独立意义。