Assuring the trustworthiness and safety of AI systems, e.g., autonomous vehicles (AV), depends critically on the data-related safety properties, e.g., representativeness, completeness, etc., of the datasets used for their training and testing. Among these properties, this paper focuses on representativeness-the extent to which the scenario-based data used for training and testing, reflect the operational conditions that the system is designed to operate safely in, i.e., Operational Design Domain (ODD) or expected to encounter, i.e., Target Operational Domain (TOD). We propose a probabilistic method that quantifies representativeness by comparing the statistical distribution of features encoded by the scenario suites with the corresponding distribution of features representing the TOD, acknowledging that the true TOD distribution is unknown, as it can only be inferred from limited data. We apply an imprecise Bayesian method to handle limited data and uncertain priors. The imprecise Bayesian formulation produces interval-valued, uncertainty-aware estimates of representativeness, rather than a single value. We present a numerical example comparing the distributions of the scenario suite and the inferred TOD across operational categories-weather, road type, time of day, etc., under dependencies and prior uncertainty. We estimate representativeness locally (between categories) and globally as an interval.
翻译:确保人工智能系统(例如自动驾驶车辆)的可信度与安全性,关键取决于其训练和测试所用数据集的数据相关安全属性,如代表性、完备性等。在这些属性中,本文聚焦于代表性——即用于训练和测试的场景数据反映系统设计安全运行的操作条件(即操作设计域)或预期遭遇条件(即目标操作域)的程度。我们提出一种概率方法,通过比较场景集编码特征的统计分布与表征目标操作域的对应特征分布来量化代表性,同时承认真实的目标操作域分布是未知的,因其仅能通过有限数据推断得出。我们采用非精确贝叶斯方法处理有限数据与先验不确定性。该非精确贝叶斯框架生成区间值、不确定性感知的代表性估计,而非单一数值。我们通过数值算例,在考虑依赖关系与先验不确定性的条件下,对比场景集与推断目标操作域在天气、道路类型、时段等操作类别上的分布。我们以区间形式对代表性进行局部(类别间)与全局估计。