Large Language Models (LLMs) are increasingly used for automated software development, making their ability to preserve secure coding practices critical. In practice, however, many security requirements are implicit or underspecified, whereas usability requirements are explicit and high-signal. This asymmetry motivates our investigation of usability pressure as a practical attack surface: realistic usability-oriented requirements (e.g., new features, performance constraints, or simplicity demands) can cause coding LLMs to satisfy explicit usability goals while silently dropping implicit security constraints -- a form of reward hacking. We formalize this threat as UPAttack and propose U-SPLOIT, an automated framework to craft UPAttack that (i) selects tasks where a model is initially secure, (ii) synthesizes usability pressures by identifying usability rewards of insecure alternatives across three vectors (Functionality, Implementation, Trade-off), and (iii) verifies security regression via both existing test cases and dynamically generated exploit payloads. Across 75 seed scenarios (25 CWEs x 3 cases), spanning multiple languages (Python, C, and JavaScript), U-SPLOIT achieves attack success rates up to 98.1% on multiple state-of-the-art models (e.g., GPT-5.2-chat and Gemini-3-Flash-Preview).
翻译:大型语言模型(LLMs)正越来越多地用于自动化软件开发,因此其保持安全编码实践的能力至关重要。然而在实践中,许多安全需求是隐式或未充分指定的,而可用性需求则是显式且高信号的。这种不对称性促使我们研究可用性压力作为一种实际攻击面:面向可用性的现实需求(例如新功能、性能约束或简洁性要求)可能导致编码LLM满足显式的可用性目标,同时暗中丢弃隐式的安全约束——这是一种奖励黑客行为。我们将这种威胁形式化为UPAttack,并提出U-SPLOIT——一个自动化的框架来构建UPAttack,该框架能够(i)选择模型初始安全的任务,(ii)通过识别三个维度(功能性、实现、权衡)中不安全替代方案的可用性奖励来合成可用性压力,以及(iii)通过现有测试用例和动态生成的利用载荷验证安全回归。在涵盖多种语言(Python、C和JavaScript)的75个种子场景(25个CWE × 3个案例)中,U-SPLOIT在多个最先进模型(如GPT-5.2-chat和Gemini-3-Flash-Preview)上实现了高达98.1%的攻击成功率。