基于小样本学习的安全缺陷报告识别 (Few-shot learning for security bug report identification) - 专知论文

会员服务 ·

0

缺陷报告 · 安全缺陷 · 识别 · 小样本 · 样本 ·

Few-shot learning for security bug report identification

翻译：基于小样本学习的安全缺陷报告识别

Security bug reports require prompt identification to minimize the window of vulnerability in software systems. Traditional machine learning (ML) techniques for classifying bug reports to identify security bug reports rely heavily on large amounts of labeled data. However, datasets for security bug reports are often scarce in practice, leading to poor model performance and limited applicability in real-world settings. In this study, we propose a few-shot learning-based technique to effectively identify security bug reports using limited labeled data. We employ SetFit, a state-of-the-art few-shot learning framework that combines sentence transformers with contrastive learning and parameter-efficient fine-tuning. The model is trained on a small labeled dataset of bug reports and is evaluated on its ability to classify these reports as either security-related or non-security-related. Our approach achieves an AUC of 0.865, at best, outperforming traditional ML techniques (baselines) for all of the evaluated datasets. This highlights the potential of SetFit to effectively identify security bug reports. SetFit-based few-shot learning offers a promising alternative to traditional ML techniques to identify security bug reports. The approach enables efficient model development with minimal annotation effort, making it highly suitable for scenarios where labeled data is scarce.

翻译：安全缺陷报告需要及时识别，以最小化软件系统中的漏洞暴露窗口。用于分类缺陷报告以识别安全缺陷报告的传统机器学习技术严重依赖大量标注数据。然而，安全缺陷报告的数据集在实践中往往稀缺，导致模型性能不佳且在实际场景中的适用性有限。在本研究中，我们提出一种基于小样本学习的技术，利用有限的标注数据有效识别安全缺陷报告。我们采用SetFit——一种结合句子Transformer、对比学习和参数高效微调的先进小样本学习框架。该模型在少量标注的缺陷报告数据集上进行训练，并评估其将报告分类为安全相关或非安全相关的能力。我们的方法最佳AUC达到0.865，在所有评估数据集上均优于传统机器学习技术。这凸显了SetFit有效识别安全缺陷报告的潜力。基于SetFit的小样本学习为识别安全缺陷报告提供了一种有前景的传统机器学习替代方案。该方法能以最少的标注工作实现高效的模型开发，使其非常适合标注数据稀缺的场景。

0

相关内容

缺陷报告

《网络安全中的机器学习算法：网络防护与攻击检测》最新报告

《网络安全中的机器学习算法：网络防护与攻击检测》最新报告

专知会员服务

19+阅读 · 2025年6月24日

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

专知会员服务

36+阅读 · 2023年7月22日

基于信息检索的软件缺陷定位方法综述

专知会员服务

10+阅读 · 2021年1月31日

机器学习的安全问题及隐私保护

专知会员服务

40+阅读 · 2020年12月20日

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

专知会员服务

210+阅读 · 2020年4月13日

融合零样本学习和小样本学习的弱监督机器学习方法综述

专知会员服务

113+阅读 · 2020年3月20日

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

专知会员服务

240+阅读 · 2020年3月2日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知会员服务

145+阅读 · 2019年10月18日

CVPR2020最新《小样本学习》综述教程，145页ppt带你学习最新FSL进展

CVPR2020最新《小样本学习》综述教程，145页ppt带你学习最新FSL进展

专知

40+阅读 · 2020年6月20日

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

专知

115+阅读 · 2020年3月2日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

从 ICLR 2019 一览小样本学习最新进展！

从 ICLR 2019 一览小样本学习最新进展！

AI科技评论

15+阅读 · 2019年6月9日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

机器之心

18+阅读 · 2019年4月1日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知

127+阅读 · 2019年3月31日

小样本如何进行深度学习？西北工夏勇教授这一份54页《医学影像小数据深度学习》PPT为你讲解

小样本如何进行深度学习？西北工夏勇教授这一份54页《医学影像小数据深度学习》PPT为你讲解

专知

45+阅读 · 2018年12月2日

【领域报告】小样本学习年度进展|VALSE2018

【领域报告】小样本学习年度进展|VALSE2018

深度学习大讲堂

26+阅读 · 2018年6月14日

基于智能模糊测试的深度漏洞挖掘技术研究

国家自然科学基金

4+阅读 · 2017年12月31日

基于学习的智能化漏洞挖掘关键技术研究

国家自然科学基金

6+阅读 · 2017年12月31日

基于抽象语义切片和后向求精分析的静态分析警报自动确认研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于网络活动分析的窃密木马检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

31+阅读 · 2015年12月31日

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

基于WEB信息的信息错误自动检测与修复技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自适应模型检测的安全协议自动建模与设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

支持软件可信演化的故障定位研究

国家自然科学基金

0+阅读 · 2014年12月31日

Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Arxiv

0+阅读 · 2月5日

From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities

Arxiv

0+阅读 · 1月31日

Evaluating Large Language Models for Security Bug Report Prediction

Arxiv

0+阅读 · 1月30日

Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry

Arxiv

0+阅读 · 1月26日

Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

Arxiv

0+阅读 · 1月26日

Identifying Concurrency Bug Reports via Linguistic Patterns

Arxiv

0+阅读 · 1月22日

Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition

Arxiv

0+阅读 · 1月18日

SysPro: Reproducing System-level Concurrency Bugs from Bug Reports

Arxiv

0+阅读 · 1月14日

From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets

Arxiv

0+阅读 · 1月12日

Rectifying Adversarial Examples Using Their Vulnerabilities

Arxiv

0+阅读 · 1月1日

VIP会员

文章信息

相关主题

相关VIP内容

《网络安全中的机器学习算法：网络防护与攻击检测》最新报告

《网络安全中的机器学习算法：网络防护与攻击检测》最新报告

专知会员服务

19+阅读 · 2025年6月24日

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

专知会员服务

36+阅读 · 2023年7月22日

基于信息检索的软件缺陷定位方法综述

专知会员服务

10+阅读 · 2021年1月31日

机器学习的安全问题及隐私保护

专知会员服务

40+阅读 · 2020年12月20日

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

专知会员服务

210+阅读 · 2020年4月13日

融合零样本学习和小样本学习的弱监督机器学习方法综述

专知会员服务

113+阅读 · 2020年3月20日

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

专知会员服务

240+阅读 · 2020年3月2日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知会员服务

145+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

美国防部门开始扩建金穹反导系统基础设施

《基于选择性深度神经网络分类的弹性无线通信》最新报告

《多域作战中融合网络、电子战与动能机动》

《在东欧磨砺反无人机技能》美陆军最新反无人机训练报告

相关资讯

CVPR2020最新《小样本学习》综述教程，145页ppt带你学习最新FSL进展

CVPR2020最新《小样本学习》综述教程，145页ppt带你学习最新FSL进展

专知

40+阅读 · 2020年6月20日

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

最新必读的8篇「小样本学习（few-shot learning）」2020顶会论文和代码

专知

115+阅读 · 2020年3月2日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

从 ICLR 2019 一览小样本学习最新进展！

从 ICLR 2019 一览小样本学习最新进展！

AI科技评论

15+阅读 · 2019年6月9日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

机器之心

18+阅读 · 2019年4月1日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知

127+阅读 · 2019年3月31日

小样本如何进行深度学习？西北工夏勇教授这一份54页《医学影像小数据深度学习》PPT为你讲解

小样本如何进行深度学习？西北工夏勇教授这一份54页《医学影像小数据深度学习》PPT为你讲解

专知

45+阅读 · 2018年12月2日

【领域报告】小样本学习年度进展|VALSE2018

【领域报告】小样本学习年度进展|VALSE2018

深度学习大讲堂

26+阅读 · 2018年6月14日

相关论文

Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Arxiv

0+阅读 · 2月5日

From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities

Arxiv

0+阅读 · 1月31日

Evaluating Large Language Models for Security Bug Report Prediction

Arxiv

0+阅读 · 1月30日

Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry

Arxiv

0+阅读 · 1月26日

Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

Arxiv

0+阅读 · 1月26日

Identifying Concurrency Bug Reports via Linguistic Patterns

Arxiv

0+阅读 · 1月22日

Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition

Arxiv

0+阅读 · 1月18日

SysPro: Reproducing System-level Concurrency Bugs from Bug Reports

Arxiv

0+阅读 · 1月14日

From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets

Arxiv

0+阅读 · 1月12日

Rectifying Adversarial Examples Using Their Vulnerabilities

Arxiv

0+阅读 · 1月1日

相关基金

基于智能模糊测试的深度漏洞挖掘技术研究

国家自然科学基金

4+阅读 · 2017年12月31日

基于学习的智能化漏洞挖掘关键技术研究

国家自然科学基金

6+阅读 · 2017年12月31日

基于抽象语义切片和后向求精分析的静态分析警报自动确认研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于网络活动分析的窃密木马检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

31+阅读 · 2015年12月31日

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

基于WEB信息的信息错误自动检测与修复技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自适应模型检测的安全协议自动建模与设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

支持软件可信演化的故障定位研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员