Multi-output Gaussian processes (MOGPs) have been introduced to deal with multiple tasks by exploiting the correlations between different outputs. Generally, MOGPs models assume a flat correlation structure between the outputs. However, such a formulation does not account for more elaborate relationships, for instance, if several replicates were observed for each output (which is a typical setting in biological experiments). This paper proposes an extension of MOGPs for hierarchical datasets (i.e. datasets for which the relationships between observations can be represented within a tree structure). Our model defines a tailored kernel function accounting for hierarchical structures in the data to capture different levels of correlations while leveraging the introduction of latent variables to express the underlying dependencies between outputs through a dedicated kernel. This latter feature is expected to significantly improve scalability as the number of tasks increases. An extensive experimental study involving both synthetic and real-world data from genomics and motion capture is proposed to support our claims.
翻译:多输出高斯过程通过利用不同输出之间的相关性来处理多任务问题。通常,多输出高斯过程模型假设输出之间具有平坦的相关结构。然而,这种表述无法处理更复杂的关系,例如当每个输出观测到多个重复样本时(这在生物实验中属于典型场景)。本文提出一种针对层次化数据集的扩展多输出高斯过程,其中观测数据之间的关系可通过树结构表示。我们的模型定义了一个定制化核函数,旨在捕捉数据中的层次化结构以刻画不同层级的相关性,同时通过引入潜变量及专用核函数来表达输出之间的潜在依赖关系。这一特性有望在任务数量增加时显著提升可扩展性。基于基因组学和运动捕捉数据的合成实验与真实实验研究充分验证了本文方法的有效性。