The use of attention-based deep learning models in stochastic filtering, e.g. transformers and deep Kalman filters, has recently come into focus; however, the potential for these models to solve stochastic filtering problems remains largely unknown. The paper provides an affirmative answer to this open problem in the theoretical foundations of machine learning by showing that a class of continuous-time transformer models, called \textit{filterformers}, can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-time (possibly non-Gaussian) measurements. Our approximation guarantees hold uniformly over sufficiently regular compact subsets of continuous-time paths, where the worst-case 2-Wasserstein distance between the true optimal filter and our deep learning model quantifies the approximation error. Our construction relies on two new customizations of the standard attention mechanism: The first can losslessly adapt to the characteristics of a broad range of paths since we show that the attention mechanism implements bi-Lipschitz embeddings of sufficiently regular sets of paths into low-dimensional Euclidean spaces; thus, it incurs no ``dimension reduction error''. The latter attention mechanism is tailored to the geometry of Gaussian measures in the $2$-Wasserstein space. Our analysis relies on new stability estimates of robust optimal filters in the conditionally Gaussian setting.
翻译:基于注意力的深度学习模型(如Transformer和深度卡尔曼滤波器)在随机滤波中的应用近期引发关注,然而这类模型求解随机滤波问题的潜力尚不明确。本文在机器学习理论基础中对这一开放问题给出了肯定回答:我们证明一类名为\textit{filterformer}的连续时间Transformer模型,能够在给定含噪连续时间(可能非高斯)观测的条件下,近似实现一类广泛非马尔可夫且条件高斯信号过程的条件分布律。我们的逼近保证在足够规则的连续时间路径紧子集上一致成立,其中真实最优滤波器与深度学习模型之间的最坏情况2-瓦瑟斯坦距离量化了逼近误差。我们的构造依赖于对标准注意力机制的两项定制优化:其一可无损适应广泛路径的统计特征——我们证明该注意力机制能将足够规则的路径集双Lipschitz嵌入低维欧氏空间,因此不产生"降维误差";其二注意力机制针对$2$-瓦瑟斯坦空间中高斯测度的几何结构进行了专门设计。分析过程基于条件高斯环境下鲁棒最优滤波器的新稳定性估计。