The distribution regression problem encompasses many important statistics and machine learning tasks, and arises in a large range of applications. Among various existing approaches to tackle this problem, kernel methods have become a method of choice. Indeed, kernel distribution regression is both computationally favorable, and supported by a recent learning theory. This theory also tackles the two-stage sampling setting, where only samples from the input distributions are available. In this paper, we improve the learning theory of kernel distribution regression. We address kernels based on Hilbertian embeddings, that encompass most, if not all, of the existing approaches. We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds on the effect of the two-stage sampling, thanks to a new analysis. We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding. As a consequence, we strictly improve the existing convergence rates for these kernels. Our setting and results are illustrated by numerical experiments.
翻译:分布回归问题涵盖了许多重要的统计和机器学习任务,并在广泛的应用中出现。在解决该问题的各种现有方法中,核方法已成为首选方法。实际上,核分布回归不仅计算上具有优势,而且受到最新学习理论的支持。该理论还处理两阶段抽样设置,其中仅可获得输入分布的样本。在本文中,我们改进了核分布回归的学习理论。我们研究了基于希尔伯特嵌入的核方法,这些方法涵盖了大多数(若非全部)现有方法。我们引入了希尔伯特嵌入上的新型近无偏条件,通过新的分析,使我们能够提供关于两阶段抽样影响的新误差界。我们证明该近无偏条件对于三类重要核(基于最优传输和均值嵌入)成立。因此,我们严格改进了这些核的现有收敛速率。我们的设置和结果通过数值实验进行了说明。