The Gaussian process (GP) is a powerful tool for nonparametric modeling, but its sensitivity to outliers limits its applicability to data distributions with heavy-tails. Studentt processes offer a robust alternative for heavy tail modeling, but they lack the scalable developments of the GP to large datasets necessary for practical applications. We present Sparse Variational Student-t Processes (SVTP), the first principled framework that extends the sparse inducing point method to the Student-t process. We develop two novel inference algorithms, SVTP-UB and SVTP-MC, with theoretical guarantees, and derive a natural gradient optimization that exploits a previously unused connection between the Fisher information matrix of the multivariate Student-t distribution and the beta function (the 'beta link'). Experiments on UCI and Kaggle datasets demonstrate that SVTP significantly outperforms sparse GPs on when the data is contains outliers and heavy tails, achieving up to 3 times faster convergence and 40% lower prediction error while maintaining computational efficiency for datasets with over 200,000 samples.
翻译:高斯过程(GP)是非参数建模的强大工具,但其对异常值的敏感性限制了其在重尾数据分布中的应用。学生t过程为重尾建模提供了稳健的替代方案,但缺乏高斯过程针对实际应用所需大规模数据集的可扩展性发展。本文提出稀疏变分学生t过程(SVTP),这是首个将稀疏诱导点方法扩展到学生t过程的原理性框架。我们开发了两种具有理论保证的新型推理算法SVTP-UB和SVTP-MC,并推导出利用多元学生t分布的费希尔信息矩阵与贝塔函数(“贝塔链接”)之间先前未被使用的关联的自然梯度优化方法。在UCI和Kaggle数据集上的实验表明,当数据包含异常值和重尾特征时,SVTP显著优于稀疏高斯过程,在保持超过20万个样本数据集计算效率的同时,实现了高达3倍的收敛速度提升和40%的预测误差降低。