This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.
翻译:本文提出了一种确保遵循给定规范的偏好学习方法,并将其应用于自动驾驶车辆。我们的方法将描述交通规则的信号时序逻辑(STL,Signal Temporal Logic)公式的优先级顺序纳入学习框架。通过利用参数化加权信号时序逻辑(PWSTL,Parametric Weighted Signal Temporal Logic),我们基于成对比较构建了具有安全保障的偏好学习问题,并提出了一种解决该学习问题的方法。该方法能够为给定的PWSTL公式的权重找到可行解,使得在该权重下,偏好信号相对于非偏好信号具有更高的加权量化满足度。由该方法得到的权重可行解可生成加权STL公式,进而用于构建"正确且定制化"的控制器综合过程。我们在两种模拟驾驶场景(包含停车标志和行人过街)中通过人体受试者试点研究验证了该方法的效果。实验结果表明,与现有偏好学习方法相比,本方法在偏好捕获方面具有竞争力,特别是在考虑安全性时显著优于现有方法。