Reinforcement learning (RL) algorithms interact with their environment in a trial-and-error fashion. Such interactions can be expensive, inefficient, and timely when learning on a physical system rather than in a simulation. This work develops new runtime verification techniques to predict when the learning phase has not met or will not meet qualitative and timely expectations. This paper presents three verification properties concerning the quality and timeliness of learning in RL algorithms. With each property, we propose design steps for monitoring and assessing the properties during the system's operation.
翻译:强化学习(RL)算法通过试错方式与环境交互。与在仿真环境中学习相比,在物理系统上进行此类交互可能成本高昂、效率低下且耗时。本文开发了新的运行时验证技术,用于预测学习阶段何时未达到或无法达到定性及时间预期。本文提出了三个关于RL算法学习质量和及时性的验证属性。针对每个属性,我们提出了在系统运行期间监控和评估这些属性的设计步骤。