Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.
翻译:离线数据是教授机器人复杂行为的既有价值又实用的资源。理想情况下,学习智能体不应受限于可用演示数据的稀缺性,而应能泛化至训练分布之外。然而,现实场景的复杂性通常需要海量数据,以防止神经网络策略捕捉到虚假关联并学习非因果关系。我们提出CAIAC,一种数据增强方法,它能够在无需访问在线环境交互的情况下,从固定数据集中创建可行的合成状态转移。通过利用量化因果影响的原理性方法,我们能够在数据集内独立轨迹之间交换状态空间中不受 $\it{行动}$ 影响的部分,从而执行反事实推理。我们通过实验证明,这能显著提升离线学习算法对分布偏移的鲁棒性。