To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with provable efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments.
翻译:为在复杂环境中获取最优约束,逆向约束强化学习(ICRL)旨在通过数据驱动的方式从专家演示中恢复这些约束。现有的ICRL算法通过交互环境收集训练样本,但这些采样策略的有效性与效率尚不明确。为弥补这一差距,我们提出一种具有可证明效率的策略性探索框架。具体而言,我们定义了ICRL问题的可行约束集,并研究了专家策略与环境动态如何影响约束的最优性。基于研究发现,我们提出两种探索算法以实现高效约束推断,其途径包括:1)动态降低成本估计的有界累积误差;2)策略性地约束探索策略。两种算法均具有可处理的样本复杂度理论保证。我们在多种环境下通过实验验证了所提算法的性能。