Based on some recent work of the author on stochastic approximation in non-markovian environments, the situation when the driving random process is non-ergodic in addition to being non-markovian is considered. Using this, we propose an analytic framework for understanding transformer based learning, specifically, the `attention' mechanism, and continual learning, both of which depend on the entire past in principle.
翻译:基于作者近期关于非马尔可夫环境中随机逼近的研究工作,本文进一步考虑了驱动随机过程在非马尔可夫性之外还呈现非遍历性的情形。借此,我们提出了一个分析框架,用于理解基于Transformer的学习机制——特别是“注意力”机制——以及持续学习,这两者在原则上均依赖于完整的过去历史。