Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, has shown promise in further pushing the accuracy and efficiency boundary. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we show that much of computation already used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity relative to the MHE horizon. As a case study, we evaluate the proposed trust region NeuroMHE on real quadrotor flight data for disturbance estimation. Our approach demonstrates highly efficient training in under 5 min using only 100 data points. It outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.
翻译:精确的扰动估计对于机器人安全操作至关重要。近期提出的神经移动时域估计(NeuroMHE)利用便携式神经网络对MHE的权重进行建模,在进一步突破精度和效率边界方面展现了显著潜力。目前,NeuroMHE通过梯度下降法训练,其梯度使用卡尔曼滤波器递归计算。本文提出了一种基于信任域策略优化的NeuroMHE训练方法。我们通过提供MHE的二阶导数(即MHE海森矩阵)来实现这一目标。值得注意的是,我们证明了计算梯度时已使用的大量计算资源(特别是卡尔曼滤波器)可以被高效复用以计算MHE海森矩阵,这使得计算复杂度与MHE时域长度呈线性关系。作为案例研究,我们在真实四旋翼飞行数据上评估了所提信任域NeuroMHE的扰动估计性能。该方法仅使用100个数据点即可在5分钟内完成高效训练,在力估计精度上超越当前最优神经估计器达68.1%,同时仅使用其1.4%的网络参数。此外,与梯度下降法相比,我们的方法展现出更强的网络初始化鲁棒性。