When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a model precondition. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques.
翻译:当使用不精确的动力学模型进行规划时,一种实用策略是将规划限制在模型准确的“状态-动作”空间区域,即模型前提条件。无论模型形式(解析模型、仿真器、学习模型等)如何,经验性的真实世界轨迹数据对于定义数据驱动的模型前提条件具有重要价值。然而,真实世界数据的收集往往代价高昂且具有危险性。为实现数据高效性,本文提出一种主动选择轨迹的算法,用于为给定的不精确动力学模型学习模型前提条件。所提出的技术解决了由轨迹序列特性引发的挑战,并突显了优先考虑任务相关数据的潜在优势。实验分析展示了算法特性在三种规划场景(冰面网格世界、模拟植物浇水及真实植物浇水)中对性能的影响。结果表明,采用所提技术后,仅需四条真实世界轨迹即可实现约80%的性能提升。