This research introduces a novel application of a masked Proximal Policy Optimization (PPO) algorithm from the field of deep reinforcement learning (RL), for determining the most efficient sequence of space debris visitation, utilizing the Lambert solver as per Izzo's adaptation for individual rendezvous. The aim is to optimize the sequence in which all the given debris should be visited to get the least total time for rendezvous for the entire mission. A neural network (NN) policy is developed, trained on simulated space missions with varying debris fields. After training, the neural network calculates approximately optimal paths using Izzo's adaptation of Lambert maneuvers. Performance is evaluated against standard heuristics in mission planning. The reinforcement learning approach demonstrates a significant improvement in planning efficiency by optimizing the sequence for debris rendezvous, reducing the total mission time by an average of approximately {10.96\%} and {13.66\%} compared to the Genetic and Greedy algorithms, respectively. The model on average identifies the most time-efficient sequence for debris visitation across various simulated scenarios with the fastest computational speed. This approach signifies a step forward in enhancing mission planning strategies for space debris clearance.
翻译:本研究引入了一种来自深度强化学习领域的掩码近端策略优化算法的新颖应用,用于确定空间碎片访问的最优序列,其中利用Izzo改进的Lambert求解器处理单个交会任务。目标是优化所有给定碎片的访问顺序,以使整个任务的总交会时间最小化。我们开发了一种神经网络策略,并在具有不同碎片场的模拟空间任务上进行训练。训练完成后,该神经网络使用Izzo改进的Lambert机动方法计算近似最优路径。研究通过与任务规划中的标准启发式方法进行性能对比评估。强化学习方法通过优化碎片交会序列,相比遗传算法和贪婪算法分别平均减少约{10.96\%}和{13.66\%}的总任务时间,在规划效率上展现出显著提升。该模型在各种模拟场景中平均能以最快的计算速度识别出时间最优的碎片访问序列。这一方法标志着空间碎片清除任务规划策略向前迈进了一步。