Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.
翻译:近期关于深度强化学习(DRL)的研究指出,可以从缺乏显式动作执行信息的离线数据中提取关于优质策略的算法信息。例如,人类或机器人的视频可能传递大量关于奖励性动作序列的隐含信息,但希望通过观看此类视频获益的DRL机器必须首先自主学习识别和认知相关状态/动作/奖励。我们的新方法——深度状态识别器(Deep State Identifier)无需依赖真实标注,即可从编码为视频的片段中学习预测回报,进而利用基于掩码的敏感性分析提取/识别重要的关键状态。大量实验证明了该方法在理解与改进智能体行为方面的潜力。源代码及生成数据集已开源至 https://github.com/AI-Initiative-KAUST/VideoRLCS。