Reinforcement learning and data-driven autonomous controllers are commonly evaluated using cumulative reward and empirical success frequency under finite simulation trajectories. However, such empirical metrics do not necessarily provide sufficient statistical evidence regarding deployment readiness under uncertainty. This work develops a Bayesian approval framework for learned autonomous landing controllers under finite rollout evidence. A probabilistic landing capability formulation is introduced based on touchdown safety satisfaction under uncertain operating conditions, while Bayesian posterior inference is used to quantify uncertainty regarding the true deployment capability of learned policies. Posterior approval probability and posterior deployment risk are further introduced for deployment-oriented evaluation, together with a sequential validation framework supporting approve/reject/continue decisions during progressive rollout testing. Simulation experiments using PPO and SAC controllers demonstrate that empirical success and reward optimization may produce overconfident deployment interpretation under limited validation evidence, whereas posterior approval inference provides a more uncertainty-calibrated assessment of deployment readiness. The proposed framework provides a practical statistical connection between conventional reinforcement-learning evaluation and deployment-oriented validation under uncertainty and may be generalized to broader classes of learned autonomous systems.
翻译:强化学习与数据驱动的自主控制器通常通过有限仿真轨迹下的累积奖励和经验成功频率进行评估。然而,此类经验度量在不确定性条件下未必能为部署准备度提供充分的统计证据。本文提出了一种基于有限推出证据的贝叶斯批准框架,用于学习型自主着陆控制器的部署决策。首先,基于不确定运行条件下触地安全性满足情况,引入了一种概率化着陆能力表述形式;同时,利用贝叶斯后验推理量化学习策略真实部署能力的不确定性。进一步引入后验批准概率和后验部署风险用于面向部署的评估,并提出一种序贯验证框架,支持渐进式推出测试中的批准/拒绝/继续决策。采用PPO和SAC控制器的仿真实验表明,在有限验证证据条件下,经验成功率和奖励优化可能产生过度自信的部署解读,而后验批准推理能提供更具不确定性校准能力的部署准备度评估。所提框架建立了传统强化学习评估与不确定性条件下面向部署的验证之间实用的统计联系,并可推广至更广泛的学习型自主系统类别。