In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environments. Previous multi-task work has mostly relied on knowledge of the mapping between raw observations and symbols appearing in the formulae. We drop this unrealistic assumption by jointly training a multi-task policy and a symbol grounder with the same experience. The symbol grounder is trained only from raw observations and sparse rewards via Neural Reward Machines in a semi-supervised fashion. Experiments on vision-based environments show that our method achieves performance comparable to using the true symbol grounding and significantly outperforms state-of-the-art methods for sub-symbolic environments.
翻译:本研究致力于解决在子符号环境中训练强化学习智能体以遵循多个线性时序逻辑表达的时序扩展指令的问题。以往的多任务研究主要依赖于原始观测与逻辑公式中符号间映射关系的先验知识。我们通过使用相同经验联合训练多任务策略与符号接地器,摒弃了这一不切实际的假设。该符号接地器仅通过神经奖励机制以半监督方式从原始观测和稀疏奖励中进行训练。在基于视觉环境的实验中,我们的方法取得了与使用真实符号接地相当的性能,并显著优于子符号环境中的现有最先进方法。