We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs).In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods.Such an approximation gives under certain conditions rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions. We complement this derivation of the lower bound by developing techniques that allow us to apply the compression approach to concrete inference problems such as kernel ridge regression. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.
翻译:我们研究了在有限维再生核希尔伯特空间(RKHS)中压缩经验测度的方法。在此背景下,经验测度被包含于一个自然凸集内,可通过凸优化方法进行近似。在特定条件下,这种近似能生成数据点的核心集。控制核心集规模的关键量是经验凸集内包含的最大以经验测度为球心的球体半径。本文主要工作是在不同条件下推导该球体半径的高概率下界。为补充下界的推导,我们开发了将压缩方法应用于具体推理问题(如核岭回归)的相关技术。最后,我们构造了一个压缩效果不佳的无限维RKHS实例,揭示了向无限维RKHS推广时面临的若干困难。