Kernel PCA for Out-of-Distribution Detection

Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper non-linear mappings. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, and seek suitable non-linear kernels that advocate the separability between InD and OoD data in the subspace spanned by the principal components. Besides, explicit feature mappings induced from the devoted task-specific kernels are adopted so that the KPCA reconstruction error for new test samples can be efficiently obtained with large-scale data. Extensive theoretical and empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA detector in efficiency and efficacy with state-of-the-art detection performance.

翻译：分布外（OoD）检测对于深度神经网络（DNN）的可靠性至关重要。现有研究表明，直接将主成分分析（PCA）应用于DNN特征在区分分布内（InD）数据与分布外数据方面存在不足。PCA的失效表明，仅通过线性子空间处理无法有效分离网络特征中的分布内与分布外数据，而适当的非线性映射可以解决此问题。本研究利用核主成分分析（KPCA）框架进行分布外检测，并寻求合适的非线性核函数，以增强主成分张成的子空间中分布内与分布外数据的可分离性。此外，通过采用由特定任务核函数诱导的显式特征映射，能够在大规模数据下高效计算新测试样本的KPCA重构误差。在多个分布外数据集和网络结构上进行的大量理论与实证结果表明，我们的KPCA检测器在效率与效能上均具有优越性，并达到了最先进的检测性能。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日