This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.
翻译:本工作提出了一种确保满足给定规范的偏好学习方法,并将其应用于自动驾驶领域。该方法将描述交通规则的信号时序逻辑(STL)公式的优先级排序融入学习框架中。通过利用参数化加权信号时序逻辑(PWSTL),基于成对比较构建了具有安全保证的偏好学习问题,并提出了该学习问题的求解方法。该方法为给定的PWSTL公式权重找到可行估值,使得在此权重下,偏好信号的加权定量满足度量值优于非偏好信号。由该方法求得的权重可行估值可生成加权STL公式,该公式可用于"正确且定制化的构造式"控制器综合。我们通过两个包含停车标志和行人横穿场景的模拟驾驶实验,开展了受试者先导研究来验证方法性能。结果表明,本方法在偏好捕捉方面与现有偏好学习方法性能相当,且在考虑安全性时显著优于现有方法。