Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.
翻译:课程学习和模仿学习已在机器人领域得到广泛应用。然而,关于将这些思想应用于高度随机时间序列数据的控制任务的研究尚十分有限。本文通过理论和实证方式,在复杂时间序列数据的代表性控制任务中探索了这些方法。我们通过数据增强实现了课程学习的核心思想,而模仿学习则通过从预言机进行策略蒸馏来实现。我们的发现表明,课程学习应被视为改进复杂时间序列控制任务性能的新方向。基于大量随机种子的样本外实证实验和消融研究,为时间序列控制的课程学习提供了极具鼓舞性的证据。尤其值得注意的是,我们在基线模型上对所有重叠超参数进行了调优——这给予了基线模型优势——而课程学习仍表现出显著优势。另一方面,我们发现模仿学习需谨慎使用。