Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process models.
翻译:条件神经过程(CNPs;Garnelo等人,2018a)是一种具有吸引力的元学习模型,能够生成校准良好的预测,并通过简单的最大似然过程进行训练。尽管CNPs具有诸多优势,但其无法对预测中的依赖关系进行建模。已有多种工作提出了解决方案,但这些方案要么需要近似推理,要么局限于高斯预测。在本工作中,我们转而提出在测试阶段改变CNPs的部署方式,而无需对模型或训练过程进行任何修改。我们不针对每个目标点独立进行预测,而是受神经自回归密度估计器(NADE)文献启发,利用概率链式法则自回归地定义联合预测分布。我们证明,这一简单过程使得因子化高斯CNPs能够建模高度依赖的非高斯预测分布。令人惊讶的是,在涵盖合成数据与真实数据的大量任务中,自回归(AR)模式下的CNPs不仅显著优于非AR CNPs,还能与计算成本更高、训练更具挑战性的复杂模型相媲美。鉴于AR CNPs并未针对联合依赖关系进行训练,这一性能表现尤为突出。我们的工作展示了神经分布估计思想如何惠及神经过程,并激励了对其他神经过程模型进行自回归部署的研究。