Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment.
翻译:记忆元学习是一种近似贝叶斯最优预测器的技术。在相当普遍的条件下,通过最小化对数损失衡量的序列预测误差,可以隐式地实现元学习。本文旨在探究当前序列预测模型及训练范式能在多大程度上实现这一解释。研究聚焦于具有未观测切换点的分段平稳源——这类源被认为捕捉了部分可观测环境中自然语言与动作-观测序列的重要特征。我们证明各类基于记忆的神经模型(包括Transformer、LSTM和RNN)能够学习准确逼近已知的贝叶斯最优算法,其行为仿佛是对潜在切换点及各分段数据分布参数执行贝叶斯推断。