When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques.
翻译:当使用不精确的动力学模型进行规划时,一种实用策略是将规划限制在模型准确的“状态-动作”空间区域,即所谓的“模型前提”。经验性的真实世界轨迹数据对于定义数据驱动的模型前提具有重要价值,无论模型形式如何(解析模型、仿真器、学习模型等)。然而,真实世界的数据通常收集成本高昂且存在危险性。为了实现数据效率,本文提出了一种主动选择轨迹的算法,用于学习一个指定的不精确动力学模型的模型前提。我们提出的技术解决了轨迹序列性带来的挑战,并充分利用优先考虑任务相关数据的潜在优势。实验分析展示了算法特性在三种规划场景(冰面网格世界、模拟植物浇水、真实植物浇水)中的性能影响。结果表明,使用我们提出的技术后,仅需四条真实世界轨迹即可实现约80%的性能提升。