We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.
翻译:我们提出两种将知识蒸馏概念拓展至高斯过程回归(Gaussian Process Regression, GPR)与高斯过程分类(Gaussian Process Classification, GPC)的方法:数据中心方法与分布中心方法。数据中心方法类似于当前机器学习中多数蒸馏技术,通过教师模型的确定性预测重新拟合模型;而分布中心方法则在下一迭代中复用完整的概率后验分布。通过分析这些方法的性质,我们证明:GPR的数据中心方法与核岭回归自蒸馏的已知结论紧密相关,而GPR的分布中心方法对应于采用特定超参数选择的普通高斯过程回归。此外,我们证明GPC的分布中心方法近似等价于数据复制与协方差的特定缩放操作,且GPC的数据中心方法需要将模型从二项似然重定义为连续伯努利似然以确保模型设定的合理性。据我们所知,我们提出的方法是首个专门针对高斯过程模型的知识蒸馏形式化框架。