Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. While early architectures were developed primarily as a scalable alternative to Gaussian Processes (GPs), modern NPs tackle far more complex and data-hungry applications spanning geology, epidemiology, climate, and robotics. These applications have placed increasing pressure on the scalability of these models, with many architectures compromising accuracy for scalability. In this paper, we demonstrate that this trade-off is often unnecessary, particularly when modeling fully or partially translation-invariant processes. We propose a versatile new architecture, the Biased Scan Attention Transformer Neural Process (BSA-TNP), which introduces Kernel Regression Blocks (KRBlocks), group-invariant attention biases, and memory-efficient Biased Scan Attention (BSA). BSA-TNP is able to: (1) match or exceed the accuracy of the best models while often training in a fraction of the time, (2) exhibit translation invariance, enabling learning at multiple resolutions simultaneously, (3) transparently model processes that evolve in both space and time, (4) support high-dimensional fixed effects, and (5) scale gracefully, running inference on over 1M test points and 100K context points in under a minute on a single 24GB GPU. Code is provided as part of the `dl4bi` package.
翻译:神经过程(NPs)是一类快速发展的模型,旨在直接建模随机过程的后验预测分布。尽管早期架构主要作为高斯过程(GPs)的可扩展替代方案而开发,现代NPs已能应对涵盖地质学、流行病学、气候学和机器人学等领域的更为复杂且数据密集的应用。这些应用对模型的可扩展性提出了越来越高的要求,许多架构为了可扩展性而牺牲了准确性。在本文中,我们证明这种权衡通常是不必要的,尤其是在建模完全或部分平移不变过程时。我们提出了一种通用的新架构——偏移扫描注意力Transformer神经过程(BSA-TNP),该架构引入了核回归块(KRBlocks)、群不变注意力偏置以及内存高效的偏移扫描注意力(BSA)。BSA-TNP能够:(1)在通常仅需少量训练时间的情况下达到或超越最佳模型的准确性;(2)展现平移不变性,支持同时在多个分辨率下学习;(3)透明地建模在空间和时间上演变的过程;(4)支持高维固定效应;(5)优雅地扩展,在单块24GB GPU上不到一分钟时间内即可对超过100万个测试点和10万个上下文点进行推断。相关代码已作为`dl4bi`软件包的一部分提供。