To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with guaranteed efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments.
翻译:为在复杂环境中获取最优约束条件,逆约束强化学习(ICRL)通过数据驱动的方式从专家示范中恢复约束条件。现有ICRL算法需从交互环境中收集训练样本,然而这些采样策略的有效性与效率尚未明确。为填补这一空白,本文提出一种具有效率保证的策略性探索框架。具体而言,我们定义了ICRL问题的可行约束集,并研究专家策略与环境动态如何影响约束的最优性。基于研究发现,我们提出两种探索算法以实现高效约束推断:1)通过动态降低成本估计的有界累积误差;2)通过策略性约束探索策略。两种算法均具有可处理的样本复杂度理论保证。我们在多种环境下通过实验验证了所提算法的性能。