The Linear Model of Co-regionalization (LMC) is a very general multitask gaussian process model for regression or classification. While its expressiveness and conceptual simplicity are appealing, naive implementations have cubic complexity in the product (number of datapoints $\times$ number of tasks), making approximations mandatory for most applications. However, recent work has shown that in some settings the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes. We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model. We introduce a full parametrization of the resulting \emph{projected LMC} model, enabling its efficient optimization. The effectiveness of this approach is assessed through synthetic and real-data experiments, testing in particular the behavior of its underlying noise model restriction.\\ Overall, the projected LMC appears as a competitive and simpler alternative to state-of-the art multitask gaussian process models. It greatly facilitates some computations such as training data updates or leave-one-out cross-validation, and is more interpretable, for it gives access to its low-dimensional quantities and to their explicit relation with the full-dimensional data. These qualities could facilitate the adoption by various industries of entire classes of methodologies, notably multitask bayesian optimization.
翻译:线性协同区域化模型(LMC)是一种用于回归或分类的通用多任务高斯过程模型。尽管其表达能力和概念简洁性颇具吸引力,但朴素实现的计算复杂度在(数据点数量×任务数量)的乘积上呈三次方增长,这使得在大多数应用中必须采用近似方法。然而,近期研究表明,在某些设定下该模型的潜在过程可实现解耦,从而使其计算复杂度仅与所述过程的数量呈线性关系。本文在更一般的假设下拓展了这些结果,证明实现LMC高效精确计算的唯一条件是对噪声模型的温和假设。我们提出了所得投影LMC模型的完整参数化方案,实现了其高效优化。通过合成数据与真实数据实验评估了该方法的有效性,特别检验了其底层噪声模型约束的行为表现。总体而言,投影LMC模型作为当前先进多任务高斯过程模型的替代方案,展现出竞争力且结构更简洁。它极大简化了训练数据更新或留一交叉验证等计算过程,并因其可访问低维量及其与全维度数据的显式关系而更具可解释性。这些特性有助于推动各行业对整套方法论的采纳,特别是在多任务贝叶斯优化领域。