Neural processes (NPs) are a powerful family of meta-learning models that seek to approximate the posterior predictive map of the ground-truth stochastic process from which each dataset in a meta-dataset is sampled. There are many cases in which practitioners, besides having access to the dataset of interest, may also have access to other datasets that share similarities with it. In this case, integrating these datasets into the NP can improve predictions. We equip NPs with this functionality and describe this paradigm as in-context in-context learning. Standard NP architectures, such as the convolutional conditional NP (ConvCNP) or the family of transformer neural processes (TNPs), are not capable of in-context in-context learning, as they are only able to condition on a single dataset. We address this shortcoming by developing the in-context in-context learning pseudo-token TNP (ICICL-TNP). The ICICL-TNP builds on the family of PT-TNPs, which utilise pseudo-token-based transformer architectures to sidestep the quadratic computational complexity associated with regular transformer architectures. Importantly, the ICICL-TNP is capable of conditioning on both sets of datapoints and sets of datasets, enabling it to perform in-context in-context learning. We demonstrate the importance of in-context in-context learning and the effectiveness of the ICICL-TNP in a number of experiments.
翻译:神经过程(NPs)是一类强大的元学习模型,旨在近似从元数据集中每个数据集采样的真实随机过程的后验预测映射。在许多实际场景中,研究者除了可获得目标数据集外,还可能获得与之具有相似性的其他数据集。在这种情况下,将这些数据集整合到NP中可以提升预测性能。我们为NPs赋予了这种功能,并将此范式称为上下文内上下文学习。标准NP架构(如卷积条件神经过程ConvCNP或Transformer神经过程TNP系列)无法实现上下文内上下文学习,因为它们仅能基于单个数据集进行条件化。针对这一缺陷,我们开发了上下文内上下文学习伪标记TNP(ICICL-TNP)。ICICL-TNP建立在PT-TNP系列基础上,该系列利用基于伪标记的Transformer架构规避了常规Transformer架构的二次计算复杂度。重要的是,ICICL-TNP能够同时基于数据点集合和数据集集合进行条件化,从而具备执行上下文内上下文学习的能力。我们通过一系列实验证明了上下文内上下文学习的重要性以及ICICL-TNP的有效性。