Robustness under latent distribution shift remains challenging in partially observable reinforcement learning. We formalize a focused setting where an adversary selects a hidden initial latent distribution before the episode, termed an adversarial latent-initial-state POMDP. Theoretically, we prove a latent minimax principle, characterize worst-case defender distributions, and derive approximate best-response certificates with finite-sample guarantees, providing formal meaning to empirical training diagnostics. Empirically, using a Battleship benchmark, we demonstrate that targeted exposure to shifted latent distributions reduces average robustness gaps between Spread and Uniform distributions from 10.3 to 3.1 shots at equal budget. Furthermore, iterative best-response training exhibits budget-sensitive behavior entirely consistent with our approximate certificate theory. Ultimately, we show that for latent-initial-state problems, our framework yields precise diagnostic principles and confirms that structured adversarial exposure effectively mitigates worst-case vulnerabilities.
翻译:在部分可观测强化学习中,潜在分布偏移下的鲁棒性仍然具有挑战性。我们形式化了一个聚焦设定:在回合开始前,对手选择一个隐藏的初始潜在分布,称为对抗性潜在初始状态POMDP。理论上,我们证明了潜在极小极大原理,刻画了最坏情况下的防御者分布,并推导出具有有限样本保证的近似最优响应证明,为经验训练诊断提供了形式化意义。在实验中,通过使用Battleship基准测试,我们证明针对偏移潜在分布的有针对性暴露,在相同预算下可将Spread分布与Uniform分布之间的平均鲁棒性差距从10.3次射击降低至3.1次射击。此外,迭代最优响应训练展现出与我们的近似证明理论完全一致的预算敏感行为。最终,我们表明对于潜在初始状态问题,我们的框架产生了精确的诊断原则,并证实结构化的对抗性暴露能有效缓解最坏情况下的脆弱性。