We develop the first end-to-end sample complexity of model-free policy gradient (PG) methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the receding-horizon policy gradient (RHPG-KF) framework and demonstrate $\tilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for RHPG-KF in learning a stabilizing filter that is $\epsilon$-close to the optimal Kalman filter. Notably, the proposed RHPG-KF framework does not require the system to be open-loop stable nor assume any prior knowledge of a stabilizing filter. Our results shed light on applying model-free PG methods to control a linear dynamical system where the state measurements could be corrupted by statistical noises and other (possibly adversarial) disturbances.
翻译:我们首次建立了离散时间无限时域卡尔曼滤波中无模型策略梯度(PG)方法的端到端样本复杂度。具体而言,我们提出了滚动时域策略梯度(RHPG-KF)框架,并证明了RHPG-KF在学得一个与最优卡尔曼滤波误差在ϵ范围内的稳定滤波器时,具有$\tilde{\mathcal{O}}(\epsilon^{-2})$的样本复杂度。值得注意的是,所提出的RHPG-KF框架既不需要系统是开环稳定的,也不需要预先知道任何稳定滤波器的先验知识。我们的研究结果为将无模型PG方法应用于控制线性动力系统提供了启示,在该系统中,状态测量可能受到统计噪声及其他(可能为对抗性的)扰动的影响。