多层置信度评分用于检测分布外样本、对抗性攻击及分布内误分类 (Multi-Layer Confidence Scoring for Detection of Out-of-Distribution Samples, Adversarial Attacks, and In-Distribution Misclassifications) - 专知论文

会员服务 ·

0

置信度 · 样本 · 对抗 · 攻击 · 对抗性攻击 ·

2025 年 12 月 22 日

Multi-Layer Confidence Scoring for Detection of Out-of-Distribution Samples, Adversarial Attacks, and In-Distribution Misclassifications

翻译：多层置信度评分用于检测分布外样本、对抗性攻击及分布内误分类

Lorenzo Capelli,Leandro de Souza Rosa,Gianluca Setti,Mauro Mangia,Riccardo Rovatti

The recent explosive growth in Deep Neural Networks applications raises concerns about the black-box usage of such models, with limited trasparency and trustworthiness in high-stakes domains, which have been crystallized as regulatory requirements such as the European Union Artificial Intelligence Act. While models with embedded confidence metrics have been proposed, such approaches cannot be applied to already existing models without retraining, limiting their broad application. On the other hand, post-hoc methods, which evaluate pre-trained models, focus on solving problems related to improving the confidence in the model's predictions, and detecting Out-Of-Distribution or Adversarial Attacks samples as independent applications. To tackle the limited applicability of already existing methods, we introduce Multi-Layer Analysis for Confidence Scoring (MACS), a unified post-hoc framework that analyzes intermediate activations to produce classification-maps. From the classification-maps, we derive a score applicable for confidence estimation, detecting distributional shifts and adversarial attacks, unifying the three problems in a common framework, and achieving performances that surpass the state-of-the-art approaches in our experiments with the VGG16 and ViTb16 models with a fraction of their computational overhead.

翻译：近年来深度神经网络应用的爆炸式增长引发了对此类模型黑箱使用的担忧，在高风险领域其透明度和可信度有限，这已具体化为欧盟《人工智能法案》等监管要求。虽然已有嵌入置信度指标的模型被提出，但此类方法无法在不重新训练的情况下应用于现有模型，限制了其广泛应用。另一方面，评估预训练模型的事后方法主要关注提升模型预测置信度，以及将检测分布外样本或对抗性攻击样本作为独立应用。为解决现有方法适用性有限的问题，我们提出了用于置信度评分的多层分析框架，这是一个统一的事后分析框架，通过分析中间层激活生成分类图。基于分类图，我们推导出适用于置信度估计、检测分布偏移和对抗性攻击的评分，将这三个问题统一于共同框架中。在使用VGG16和ViTb16模型的实验中，该框架以远低于现有先进方法的计算开销实现了超越其性能的表现。

0

相关内容

置信度

【ICML2025】Proxy-FDA：基于代理的特征分布对齐方法，用于无遗忘地微调视觉基础模型

【ICML2025】Proxy-FDA：基于代理的特征分布对齐方法，用于无遗忘地微调视觉基础模型

专知会员服务

9+阅读 · 2025年6月3日

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

推荐系统中的扩散模型：综述

推荐系统中的扩散模型：综述

专知会员服务

21+阅读 · 2025年1月22日

《分布外泛化评估》综述

《分布外泛化评估》综述

专知会员服务

43+阅读 · 2024年3月6日

机器学习可解释如何客观评估？CMU-Yeh博士论文《可解释机器学习的客观标准》，148页pdf

机器学习可解释如何客观评估？CMU-Yeh博士论文《可解释机器学习的客观标准》，148页pdf

专知会员服务

79+阅读 · 2022年11月23日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【CVPR2021】深度稳定学习分布外泛化

专知会员服务

30+阅读 · 2021年5月20日

【CVPR 2020 Oral】小样本类增量学习

专知会员服务

112+阅读 · 2020年6月26日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知

12+阅读 · 2020年10月9日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

一文详解深度学习在命名实体识别(NER)中的应用

一文详解深度学习在命名实体识别(NER)中的应用

AINLP

24+阅读 · 2018年10月23日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

LibRec 每周算法：LDA主题模型

LibRec 每周算法：LDA主题模型

LibRec智能推荐

29+阅读 · 2017年12月4日

处理效应差异中位数的有效估计

国家自然科学基金

0+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

基于对称识别方法的贝叶斯probit模型稳健性研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

非参数核方法的样本外扩展研究

国家自然科学基金

2+阅读 · 2015年12月31日

不确定知识图谱中面向结构查询的众包清洗研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于Spark的大图数据最优子模式匹配查询方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

Deep Semi-Supervised Survival Analysis for Predicting Cancer Prognosis

Arxiv

0+阅读 · 1月28日

Dynamic Expert-Guided Model Averaging for Causal Discovery

Arxiv

0+阅读 · 1月23日

White-Box Sensitivity Auditing with Steering Vectors

Arxiv

0+阅读 · 1月23日

Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

Arxiv

0+阅读 · 1月22日

Enhancing Few-Shot Out-of-Distribution Detection via the Refinement of Foreground and Background

Arxiv

0+阅读 · 1月21日

Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

Arxiv

0+阅读 · 1月19日

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Arxiv

0+阅读 · 1月16日

LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals

Arxiv

0+阅读 · 1月15日

Towards Online Malware Detection using Process Resource Utilization Metrics

Arxiv

0+阅读 · 1月15日

Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses?

Arxiv

0+阅读 · 1月15日

VIP会员

文章信息

相关主题

对抗性攻击

相关VIP内容

【ICML2025】Proxy-FDA：基于代理的特征分布对齐方法，用于无遗忘地微调视觉基础模型

【ICML2025】Proxy-FDA：基于代理的特征分布对齐方法，用于无遗忘地微调视觉基础模型

专知会员服务

9+阅读 · 2025年6月3日

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

推荐系统中的扩散模型：综述

推荐系统中的扩散模型：综述

专知会员服务

21+阅读 · 2025年1月22日

《分布外泛化评估》综述

《分布外泛化评估》综述

专知会员服务

43+阅读 · 2024年3月6日

机器学习可解释如何客观评估？CMU-Yeh博士论文《可解释机器学习的客观标准》，148页pdf

机器学习可解释如何客观评估？CMU-Yeh博士论文《可解释机器学习的客观标准》，148页pdf

专知会员服务

79+阅读 · 2022年11月23日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【CVPR2021】深度稳定学习分布外泛化

专知会员服务

30+阅读 · 2021年5月20日

【CVPR 2020 Oral】小样本类增量学习

专知会员服务

112+阅读 · 2020年6月26日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知

12+阅读 · 2020年10月9日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

一文详解深度学习在命名实体识别(NER)中的应用

一文详解深度学习在命名实体识别(NER)中的应用

AINLP

24+阅读 · 2018年10月23日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

LibRec 每周算法：LDA主题模型

LibRec 每周算法：LDA主题模型

LibRec智能推荐

29+阅读 · 2017年12月4日

相关论文

Deep Semi-Supervised Survival Analysis for Predicting Cancer Prognosis

Arxiv

0+阅读 · 1月28日

Dynamic Expert-Guided Model Averaging for Causal Discovery

Arxiv

0+阅读 · 1月23日

White-Box Sensitivity Auditing with Steering Vectors

Arxiv

0+阅读 · 1月23日

Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

Arxiv

0+阅读 · 1月22日

Enhancing Few-Shot Out-of-Distribution Detection via the Refinement of Foreground and Background

Arxiv

0+阅读 · 1月21日

Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

Arxiv

0+阅读 · 1月19日

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Arxiv

0+阅读 · 1月16日

LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals

Arxiv

0+阅读 · 1月15日

Towards Online Malware Detection using Process Resource Utilization Metrics

Arxiv

0+阅读 · 1月15日

Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses?

Arxiv

0+阅读 · 1月15日

相关基金

处理效应差异中位数的有效估计

国家自然科学基金

0+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

基于对称识别方法的贝叶斯probit模型稳健性研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

非参数核方法的样本外扩展研究

国家自然科学基金

2+阅读 · 2015年12月31日

不确定知识图谱中面向结构查询的众包清洗研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于Spark的大图数据最优子模式匹配查询方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员