In recent years, Reward Machines (RMs) have stood out as a simple yet effective automata-based formalism for exposing and exploiting task structure in reinforcement learning settings. Despite their relevance, little to no attention has been directed to the study of their security implications and robustness to adversarial scenarios, likely due to their recent appearance in the literature. With my thesis, I aim to provide the first analysis of the security of RM-based reinforcement learning techniques, with the hope of motivating further research in the field, and I propose and evaluate a novel class of attacks on RM-based techniques: blinding attacks.
翻译:近年来,奖励机器(RMs)作为一种简单而有效的基于自动机的形式化方法,在强化学习场景中用于揭示和利用任务结构而脱颖而出。尽管其相关性强,但其安全影响和对对抗场景的鲁棒性研究却很少受到关注,这很可能是因为它们在文献中出现的时间较晚。在本文中,我旨在首次分析基于奖励机器的强化学习技术的安全性,以期推动该领域的进一步研究,并提出并评估一种新颖的针对基于奖励机器技术的攻击类别:盲化攻击。