LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.
翻译:大语言模型智能体通常依赖“技能”来描述可用工具和推荐流程。本研究探讨了该文档层中一种隐藏注释提示注入风险:当Markdown格式的技能文档被渲染为HTML时,HTML注释块可能对人工审阅者不可见,但其原始文本仍可能被逐字提供给模型。实验发现,DeepSeek-V3.2和GLM-4.5-Air模型会受到附加在合法技能文档末尾的隐藏注释中恶意指令的影响,产生包含敏感工具意图的输出。通过在系统提示中添加简短防御性指令——将技能视为不可信来源并禁止敏感操作——可有效阻止此类恶意工具调用,同时使可疑的隐藏指令暴露出来。