ROVER: Regulator-Driven Robust Temporal Verification of Black-Box Robot Policies

We present a novel, regulator-driven approach for the temporal verification of black-box autonomous robot policies, inspired by real-world certification processes where regulators often evaluate observable behavior without access to model internals. Central to our method is a regulator-in-the-loop approach that evaluates execution traces from black-box policies against temporal safety requirements. These requirements, expressed as prioritized Signal Temporal Logic (STL) specifications, characterize behavior changes over time and encode domain knowledge into the verification process. We use Total Robustness Value (TRV) and Largest Robustness Value (LRV) to quantify average performance and worst-case adherence, and introduce Average Violation Robustness Value (AVRV) to measure average specification violation. Together, these metrics guide targeted retraining and iterative model improvement. Our approach accommodates diverse temporal safety requirements (e.g., lane-keeping, delayed acceleration, and turn smoothness), capturing persistence, sequencing, and response across two distinct domains (virtual racing game and mobile robot navigation). Across six STL specifications in both scenarios, regulator-guided retraining increased satisfaction rates by an average of 43.8%, with consistent improvement in average performance (TRV) and reduced violation severity (LRV) in half of the specifications. Finally, real-world validation on a TurtleBot3 robot demonstrates a 27% improvement in smooth-navigation satisfaction, yielding smoother paths and stronger compliance with STL-defined temporal safety requirements.

翻译：本文提出一种新颖的、受监管机制驱动的黑盒自主机器人策略时序验证方法，其灵感来源于现实世界认证流程——监管机构通常仅通过可观测行为进行评估，而无需访问模型内部结构。我们方法的核心在于采用"监管闭环"机制，通过对比黑盒策略的执行轨迹与时序安全要求进行评估。这些以优先级化信号时序逻辑（STL）规范表述的要求，能够刻画行为随时间的变化特征，并将领域知识编码至验证过程中。我们采用总体鲁棒值（TRV）与最大鲁棒值（LRV）分别量化平均性能与最差情况下的规范符合度，同时引入平均违反鲁棒值（AVRV）以度量平均规范违反程度。这些指标共同指导针对性重训练与迭代式模型改进。我们的方法兼容多种时序安全要求（如车道保持、延迟加速和转向平顺性），在两个不同领域（虚拟竞速游戏与移动机器人导航）中捕捉持续性、序列性和响应性特征。在两种场景共六项STL规范的测试中，监管引导的重训练使规范满足率平均提升43.8%，其中半数规范的平均性能（TRV）持续改善且违反严重程度（LRV）降低。最后，在TurtleBot3机器人上的真实场景验证表明，平顺导航满足率提升27%，生成路径更平滑且对STL定义的时序安全要求具有更强符合性。