Content composition vulnerabilities remain among the most prevalent and persistent classes of security weakness in deployed software. Prior mitigations, including developer training, static analysis tools, and domain-specific template languages, each face diminishing returns; AI code generation inherits these limitations and introduces new ones, reproducing insecure patterns from training data and lacking reliable context for self-correction. This paper introduces a general framework for secure content composition that extends across content languages and integrates directly into general-purpose programming languages via additive changes to string expression syntax. We define a language design goal of minimizing the lexical distance between secure and insecure idioms, and show that this goal admits practical compilation strategies: static analyses specified in terms of dynamic semantics, runtime performance approaching naïve string concatenation, and developer-facing diagnostics surfaced as compile-time errors or warnings. The approach enables an effective division of labor: security engineers encode composition hazards in libraries once; developers and AI coding agents select the appropriate library primitive to implement features correctly without needing to internalize specialist security knowledge; compiler diagnostics provide objective, position-keyed feedback that grounds both human review and iterative AI self-correction; and security responders focus on keeping libraries current rather than auditing ad-hoc security decisions distributed across a codebase.
翻译:内容组合漏洞仍是已部署软件中最普遍且持续存在的安全弱点类别之一。现有的缓解措施,包括开发者培训、静态分析工具和领域特定模板语言,均面临收益递减的问题;AI代码生成继承了这些局限性并引入了新的问题,不仅从训练数据中复现不安全模式,还缺乏可靠的自我纠正上下文。本文提出了一种通用的安全内容组合框架,该框架可跨内容语言扩展,并通过字符串表达式语法的增量变更直接集成到通用编程语言中。我们定义了一个语言设计目标:最小化安全与非安全惯用表达之间的词汇距离,并证明该目标可实现实用的编译策略:基于动态语义指定的静态分析、接近朴素字符串拼接的运行时性能、以及以编译时错误或警告形式呈现的面向开发者的诊断信息。该方法实现了有效的分工:安全工程师将组合风险一次性编码到库中;开发者和AI编码代理通过选择适当的库原语正确实现功能,而无需内化专业安全知识;编译器诊断提供客观、位置明确的反馈,为人工审查和迭代式AI自我纠正奠定基础;安全响应人员则专注于保持库的时效性,而非审计分散在代码库中的临时安全决策。