This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.
翻译:本文提出了一种基于机会约束强化学习的分布无关鲁棒轨迹优化框架。在此框架中,不确定性通过初始条件和过程噪声表示,唯一要求是其可被采样。首先离线计算确定性标称轨迹,随后仅利用强化学习通过结构化仿射闭环校正律(包含前馈控制调整和时变反馈增益)对该基线进行鲁棒化。通过基于 rollout 的上尾分位数经验性地保证概率可行性,同时利用协方差可行性罚项调节终端分散程度。该框架在两个性质迥异的轨迹设计问题上进行了评估。核心案例研究为三维多脉冲地火转移问题,其中学习策略与近期鲁棒轨迹优化基准在高斯不确定性下进行对比,随后在训练过程中未见的有界均匀不确定性和过程扰动下进行评估。第二个案例研究为随机大气定点火箭着陆问题,用于评估该框架在具有阻力、质量消耗和下滑角约束的短时连续推力场景下的可移植性。结果表明,所提出的框架能在保持概率可行性的同时,在上尾燃料成本上保持竞争力,并且相同的鲁棒化框架可跨异质航天器轨迹规划问题移植,而无需重新设计其核心随机控制结构。