Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.
翻译:在线强化学习中的缺失数据问题,相较于标准表格数据或离线策略学习中的缺失数据更具挑战性。需要在每个时间步进行插补并执行动作,这意味着无法等到积累足够数据以建立稳定的插补模型后再进行处理。这也意味着未来的数据收集和学习过程依赖于先前的插补结果。本文提出完全在线插补集成方法。我们发现,维护多重插补路径有助于平衡缺失情况下不确定性捕捉的需求与在线场景中对效率的要求。我们探讨了多种将这些路径整合到学习和动作选择中的方法。通过在不同缺失类型的Grid World实验中进行验证,我们提供了初步证据表明多重插补路径可能成为构建简单高效的在线缺失数据强化学习方法的有效框架。