Transformers are a class of autoregressive deep learning architectures which have recently achieved state-of-the-art performance in various vision, language, and robotics tasks. We revisit the problem of Kalman Filtering in linear dynamical systems and show that Transformers can approximate the Kalman Filter in a strong sense. Specifically, for any observable LTI system we construct an explicit causally-masked Transformer which implements the Kalman Filter, up to a small additive error which is bounded uniformly in time; we call our construction the Transformer Filter. Our construction is based on a two-step reduction. We first show that a softmax self-attention block can exactly represent a certain Gaussian kernel smoothing estimator. We then show that this estimator closely approximates the Kalman Filter. We also investigate how the Transformer Filter can be used for measurement-feedback control and prove that the resulting nonlinear controllers closely approximate the performance of standard optimal control policies such as the LQG controller.
翻译:Transformer是一类自回归深度学习架构,近期在视觉、语言和机器人等任务中取得了最先进性能。我们重新审视线性动力系统中的卡尔曼滤波问题,并证明Transformer可以在强意义上逼近卡尔曼滤波器。具体而言,针对任意可观测的LTI系统,我们显式构建了一个因果掩码Transformer,其能够实现卡尔曼滤波器,且附加误差小且随时间一致有界;我们将此结构称为Transformer滤波器。我们的构造基于两步归约法。首先证明softmax自注意力块可以精确表示某类高斯核平滑估计器,进而证明该估计器可紧密逼近卡尔曼滤波器。我们还研究了如何将Transformer滤波器用于测量-反馈控制,并证明由此产生的非线性控制器能够紧密逼近LQG控制器等最优控制策略的性能。