AI coding assistants produce vulnerable code in 45\% of security-relevant scenarios~\cite{veracode2025}, yet no public training dataset teaches both traditional web security and AI/ML-specific defenses in a format suitable for instruction tuning. We present SecureCode, a production-grade dataset of 2,185 multi-turn security training examples spanning two domains: web application security (1,435 examples covering the OWASP Top 10 2021 across 11 languages and 9 frameworks, 100\% grounded in documented CVEs and security incidents) and AI/ML security (750 examples covering all 10 OWASP LLM Top 10 2025 categories across more than 40 frameworks, including LangChain, OpenAI, and Hugging Face). Every example follows a 4-turn conversational structure -- feature request; vulnerable and secure implementations with attack demonstrations; advanced probing; and defense-in-depth operational guidance -- designed for direct use in instruction tuning pipelines. Quality assurance combines automated structural validation with multi-agent review from seven specialist AI perspectives (more than 10{,}500 assessments) and an 8-phase remediation pipeline, producing a rubric-calibrated mean quality score of 93.8/100 ($σ= 0.93$) for the AI/ML component. Each example provides SIEM integration strategies, infrastructure hardening recommendations, and testing approaches using production frameworks. We release the unified dataset on Hugging Face with domain-specific loading configurations (web, aiml, default), alongside eight fine-tuned open-source models (3B--20B parameters, QLoRA), and an evaluation framework with four security-specific metrics. To our knowledge, SecureCode is the first public dataset that jointly provides OWASP Top 10 2021 web coverage and OWASP LLM Top 10 2025 AI/ML coverage in a unified conversational schema suitable for instruction tuning.
翻译:AI编码助手在45%的安全相关场景中会产生易受攻击的代码~\cite{veracode2025},然而目前尚无公开的训练数据集能以适用于指令微调的格式,同时教授传统Web安全与AI/ML特定防御知识。我们提出了SecureCode,这是一个包含2,185个多轮安全训练样本的生产级数据集,涵盖两大领域:Web应用安全(1,435个样本,覆盖OWASP Top 10 2021的10大风险类别,涉及11种编程语言和9种框架,100%基于已记录的CVE和安全事件)以及AI/ML安全(750个样本,覆盖OWASP LLM Top 10 2025的全部10个风险类别,涉及超过40个框架,包括LangChain、OpenAI和Hugging Face)。每个样本均遵循4轮对话结构——功能需求;包含攻击演示的脆弱与安全实现;高级探测;纵深防御操作指南——专为直接用于指令微调流程而设计。质量保障结合了自动化结构验证与来自七个专业AI视角的多智能体评审(超过10,500次评估)以及八阶段修复流程,使AI/ML组件的标定质量平均分达到93.8/100($σ= 0.93$)。每个样本均提供SIEM集成策略、基础设施加固建议以及使用生产框架的测试方法。我们在Hugging Face上发布了统一数据集,并提供领域特定的加载配置(web、aiml、default),同时发布了八个经过微调的开源模型(参数量3B–20B,采用QLoRA)以及包含四项安全专项指标的评估框架。据我们所知,SecureCode是首个在统一的对话架构下,同时提供OWASP Top 10 2021 Web安全覆盖和OWASP LLM Top 10 2025 AI/ML安全覆盖的公开数据集,适用于指令微调。