We improve the accuracy of Guidance & Control Networks (G&CNETs), trained to represent the optimal control policies of a time-optimal transfer and a mass-optimal landing, respectively. In both cases we leverage the dynamics of the spacecraft, described by Ordinary Differential Equations which incorporate a neural network on their right-hand side (Neural ODEs). Since the neural dynamics is differentiable, the ODEs sensitivities to the network parameters can be computed using the variational equations, thereby allowing to update the G&CNET parameters based on the observed dynamics. We start with a straightforward regression task, training the G&CNETs on datasets of optimal trajectories using behavioural cloning. These networks are then refined using the Neural ODE sensitivities by minimizing the error between the final states and the target states. We demonstrate that for the orbital transfer, the final error to the target can be reduced by 99% on a single trajectory and by 70% on a batch of 500 trajectories. For the landing problem the reduction in error is around 98-99% (position) and 40-44% (velocity). This step significantly enhances the accuracy of G&CNETs, which instills greater confidence in their reliability for operational use. We also compare our results to the popular Dataset Aggregation method (DaGGER) and allude to the strengths and weaknesses of both methods.
翻译:我们提升了分别针对时间最优转移和质量最优着陆训练的高精度制导与控制网络(G&CNETs)的准确性。在这两种情况下,我们都利用了航天器的动力学特性,该特性由常微分方程描述,其右侧整合了一个神经网络(神经ODE)。由于神经动力学是可微的,可以通过变分方程计算ODE对网络参数的敏感性,从而允许根据观测到的动力学更新G&CNET参数。我们从直接的回归任务开始,使用行为克隆在最优轨迹数据集上训练G&CNETs。然后,通过最小化最终状态与目标状态之间的误差,利用神经ODE敏感性对这些网络进行精炼。我们证明,对于轨道转移,单个轨迹的最终目标误差可降低99%,批量500条轨迹的误差可降低70%。对于着陆问题,误差降低约为98-99%(位置)和40-44%(速度)。这一步骤显著提升了G&CNETs的准确性,增强了对其运行可靠性的信心。我们还将我们的结果与流行的数据集聚合方法(DaGGER)进行了比较,并指出了两种方法的优缺点。