Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.
翻译:高斯过程回归(GPR)或核岭回归是一种广泛使用且强大的非线性预测工具。因此,针对GPR的主动学习(AL)——即主动收集数据标签以使用更少的标签实现准确预测——成为一个重要问题。然而,现有AL方法无法在理论上保证对目标分布的预测精度。此外,正如分布鲁棒学习文献中所讨论的,精确指定目标分布通常十分困难。因此,本文提出了两种AL方法,能有效降低GPR的最坏情况期望误差,即目标分布候选集中的最坏情况期望。我们给出了最坏情况期望平方误差的上界,该上界表明在温和条件下,通过有限数量的数据标签可使误差任意小。最后,我们通过合成数据集和真实数据集验证了所提方法的有效性。