PAC-Based Formal Verification for Out-of-Distribution Data Detection

Cyber-physical systems (CPS) like autonomous vehicles, that utilize learning components, are often sensitive to noise and out-of-distribution (OOD) instances encountered during runtime. As such, safety critical tasks depend upon OOD detection subsystems in order to restore the CPS to a known state or interrupt execution to prevent safety from being compromised. However, it is difficult to guarantee the performance of OOD detectors as it is difficult to characterize the OOD aspect of an instance, especially in high-dimensional unstructured data. To distinguish between OOD data and data known to the learning component through the training process, an emerging technique is to incorporate variational autoencoders (VAE) within systems and apply classification or anomaly detection techniques on their latent spaces. The rationale for doing so is the reduction of the data domain size through the encoding process, which benefits real-time systems through decreased processing requirements, facilitates feature analysis for unstructured data and allows more explainable techniques to be implemented. This study places probably approximately correct (PAC) based guarantees on OOD detection using the encoding process within VAEs to quantify image features and apply conformal constraints over them. This is used to bound the detection error on unfamiliar instances with user-defined confidence. The approach used in this study is to empirically establish these bounds by sampling the latent probability distribution and evaluating the error with respect to the constraint violations that are encountered. The guarantee is then verified using data generated from CARLA, an open-source driving simulator.

翻译：如自动驾驶汽车等使用学习组件的网络物理系统(CPS)在运行时通常对噪声和分布外(OOD)实例敏感。因此，安全关键任务依赖于OOD检测子系统，以将CPS恢复到已知状态或中断执行以防止安全性受损。然而，由于难以表征实例的OOD特征（尤其是在高维非结构化数据中），因此难以保证OOD检测器的性能。为区分OOD数据与学习组件通过训练过程已知的数据，一种新兴技术是在系统中整合变分自编码器(VAE)，并对其潜在空间应用分类或异常检测技术。其基本原理在于通过编码过程缩减数据域规模，这通过降低处理需求有利于实时系统，促进非结构化数据的特征分析，并允许实现更具可解释性的技术。本研究利用VAE中的编码过程量化图像特征并对其施加共形约束，为基于概率近似正确(PAC)的OOD检测提供保证。这用于以用户定义的置信度约束未知实例的检测误差。本研究采用的方法是通过对潜在概率分布进行采样，并根据遇到的约束违反情况评估误差来经验性地建立这些界限。随后使用开源驾驶模拟器CARLA生成的数据对保证进行验证。