LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations - 专知论文

会员服务 ·

0

Prompt · Performer · 代码 · 数据集 · MoDELS ·

2023 年 3 月 16 日

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

翻译：LLMSecEval：面向安全评估的自然语言提示数据集

Catherine Tony,Markus Mutas,Nicolás E. Díaz Ferreyra,Riccardo Scandariato

from arxiv, Accepted at MSR '23 Data and Tool Showcase Track

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.

翻译：大型语言模型（LLMs）如Codex，因从公开来源的数万亿行代码中训练而成，成为执行代码补全与代码生成任务的强大工具。此外，这些模型通过从公共GitHub仓库学习语言与编程实践，能够根据自然语言描述生成代码片段。尽管LLMs承诺基于自然语言驱动的软件应用开发将变得简便，但其生成代码的安全性尚未得到广泛研究或记录。本文提出LLMSecEval数据集，包含150个自然语言提示，可用于评估此类模型的安全性能。这些提示是易受MITRE Top 25通用弱点枚举排名中各类安全漏洞影响的代码片段的自然语言描述。数据集中每个提示均附有安全实现示例，以便与LLMs生成的代码进行对比评估。作为实际应用，我们展示了如何利用LLMSecEval评估从自然语言描述自动生成的代码片段的安全性。

0

相关内容

Prompt

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

纳米材料性质定量分析中的反问题

国家自然科学基金

1+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型内陆盆地地下水系统演化及其生态响应研究

国家自然科学基金

0+阅读 · 2012年12月31日

影响HbH-CS病表型多样性的甲基化基因位点的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低氧对阿尔茨海默病发病影响及表观遗传学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

动力锂离子电池正极材料Li1-xMyVOPO4/C的制备及性能

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Arxiv

0+阅读 · 2023年5月7日

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Arxiv

0+阅读 · 2023年5月6日

Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Arxiv

0+阅读 · 2023年5月4日

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Arxiv

0+阅读 · 2023年5月4日

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

VIP会员

文章信息

相关主题

最新内容

打造“新蛛网”模式与高科技动员：如何让保卫国家变得有吸引力

打造“新蛛网”模式与高科技动员：如何让保卫国家变得有吸引力

专知会员服务

0+阅读 · 21分钟前

“蛛网”行动一周年：远程无人机战争

“蛛网”行动一周年：远程无人机战争

专知会员服务

0+阅读 · 31分钟前

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

专知会员服务

0+阅读 · 39分钟前

为何“第一次人工智能战争（美以伊冲突）”仍是人类主导的斗争

为何“第一次人工智能战争（美以伊冲突）”仍是人类主导的斗争

专知会员服务

0+阅读 · 45分钟前

【剑桥博士论文】智能体-环境协同优化

【剑桥博士论文】智能体-环境协同优化

专知会员服务

5+阅读 · 6月9日

ACL 2026综述｜多模态基础模型测试时扩展：生成与推理统一框架

ACL 2026综述｜多模态基础模型测试时扩展：生成与推理统一框架

专知会员服务

3+阅读 · 6月9日

《面向国防应用的无人机选型：一种对比性多模糊多准则决策框架》

《面向国防应用的无人机选型：一种对比性多模糊多准则决策框架》

专知会员服务

9+阅读 · 6月9日

无人机战争：从乌克兰到中东战场的沙希德（Shahed）无人机分析

无人机战争：从乌克兰到中东战场的沙希德（Shahed）无人机分析

专知会员服务

6+阅读 · 6月9日

为初级军官战术训练设计生成式人工智能平台

为初级军官战术训练设计生成式人工智能平台

专知会员服务

7+阅读 · 6月9日

《美空军条令出版物 3-40，反大规模杀伤性武器作战》

《美空军条令出版物 3-40，反大规模杀伤性武器作战》

专知会员服务

6+阅读 · 6月9日

《美军条令：作战伤员后送保障》

《美军条令：作战伤员后送保障》

专知会员服务

5+阅读 · 6月9日

《美空军条令出版物 4-0，维持》

《美空军条令出版物 4-0，维持》

专知会员服务

5+阅读 · 6月9日

《通过自然语言与强化学习奖励机制将军事条令与目标融入AI智能体》

《通过自然语言与强化学习奖励机制将军事条令与目标融入AI智能体》

专知会员服务

10+阅读 · 6月9日

《基于DIJKSTRA最短路径算法在AFSIM框架中实现高效动态威胁规避路径规划》

《基于DIJKSTRA最短路径算法在AFSIM框架中实现高效动态威胁规避路径规划》

专知会员服务

4+阅读 · 6月9日

《修正错误与改进设计：运用数据耕耘支持基于智能体的军事仿真模型验证与确认》

《修正错误与改进设计：运用数据耕耘支持基于智能体的军事仿真模型验证与确认》

专知会员服务

4+阅读 · 6月9日

相关VIP内容

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

【剑桥博士论文】智能体-环境协同优化

“蛛网”行动一周年：远程无人机战争

为何“第一次人工智能战争（美以伊冲突）”仍是人类主导的斗争

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Arxiv

0+阅读 · 2023年5月7日

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Arxiv

0+阅读 · 2023年5月6日

Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Arxiv

0+阅读 · 2023年5月4日

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Arxiv

0+阅读 · 2023年5月4日

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

相关基金

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

纳米材料性质定量分析中的反问题

国家自然科学基金

1+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型内陆盆地地下水系统演化及其生态响应研究

国家自然科学基金

0+阅读 · 2012年12月31日

影响HbH-CS病表型多样性的甲基化基因位点的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低氧对阿尔茨海默病发病影响及表观遗传学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

动力锂离子电池正极材料Li1-xMyVOPO4/C的制备及性能

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员