Gaussian processes are a powerful class of non-linear models, but have limited applicability for larger datasets due to their high computational complexity. In such cases, approximate methods are required, for example, the recently developed class of Hilbert space Gaussian processes. They have been shown to significantly reduce computation time while retaining most of the favorable properties of exact Gaussian processes. However, Hilbert space approximations have so far only been developed for uni-dimensional outputs and manifest (known) inputs. Thus, we generalize Hilbert space methods to multi-output and latent input settings. Through extensive simulations, we show that the developed approximate Gaussian processes are indeed not only faster, but also provide similar or even better uncertainty calibration and accuracy of latent variable estimates compared to exact Gaussian processes. While not necessarily faster than alternative Gaussian process approximations, our new models provide better calibration and estimation accuracy, thus striking an excellent balance between trustworthiness and speed. We additionally illustrate our methods on a real-world case study from single cell biology.
翻译:高斯过程是一类强大的非线性模型,但由于其较高的计算复杂度,在较大数据集上的应用受到限制。在这种情况下,需要采用近似方法,例如最近发展起来的希尔伯特空间高斯过程类。研究表明,这类方法在保留精确高斯过程大部分优良性质的同时,能显著减少计算时间。然而,迄今为止希尔伯特空间近似方法仅针对单维输出和显式(已知)输入场景而开发。因此,我们将希尔伯特空间方法推广至多输出与隐变量输入场景。通过大量仿真实验,我们证明所开发的近似高斯过程不仅速度更快,而且在隐变量估计的不确定性校准与精度方面,相较于精确高斯过程提供了相似甚至更优的性能。虽然该方法在速度上未必优于其他高斯过程近似方案,但我们提出的新模型提供了更好的校准效果与估计精度,从而在可信度与速度之间实现了卓越的平衡。我们还在单细胞生物学的实际案例研究中进一步展示了所提出方法的有效性。