Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered content to decide whether a website is a phishing site. While these approaches are promising, LLMs are inherently vulnerable to prompt injection (PI). Because attackers can fully control various elements of phishing sites, this creates the potential for PI that exploits the perceptual asymmetry between LLMs and humans: instructions imperceptible to end users can still be parsed by the LLM and can stealthily manipulate its judgment. The specific risks of PI in phishing detection and effective mitigation strategies remain largely unexplored. This paper presents the first comprehensive evaluation of PI against multimodal LLM-based phishing detection. We introduce a two-dimensional taxonomy, defined by Attack Techniques and Attack Surfaces, that captures realistic PI strategies. Using this taxonomy, we implement diverse attacks and empirically study several representative LLM-based detection systems. The results show that phishing detection with state-of-the-art models such as GPT-5 remains vulnerable to PI. We then propose InjectDefuser, a defense framework that combines prompt hardening, allowlist-based retrieval augmentation, and output validation. Across multiple models, InjectDefuser significantly reduces attack success rates. Our findings clarify the PI risk landscape and offer practical defenses that improve the reliability of next-generation phishing countermeasures.
翻译:钓鱼网站的数量与复杂程度持续增长。近期研究利用大型语言模型(LLM)分析URL、HTML及渲染内容以判定网站是否为钓鱼网站。尽管这类方法前景广阔,但LLM本质上易受提示注入(PI)攻击。由于攻击者可完全控制钓鱼网站的各类元素,这为利用LLM与人类感知不对称性的PI创造了可能:终端用户难以察觉的指令仍可被LLM解析,并隐秘操控其判断。钓鱼检测中PI的具体风险及有效缓解策略目前尚未得到充分探索。本文首次对针对多模态LLM钓鱼检测的PI进行全面评估。我们提出一个由攻击技术与攻击面定义的双维度分类法,用以刻画现实中的PI策略。基于该分类法,我们实现了多样化攻击,并对多个代表性LLM检测系统进行了实证研究。结果表明,即使采用GPT-5等前沿模型,钓鱼检测系统仍易受PI攻击。为此,我们提出InjectDefuser防御框架,该框架融合提示强化、基于许可列表的检索增强及输出验证机制。在多个模型上的实验表明,InjectDefuser能显著降低攻击成功率。本研究明确了PI的风险图景,并为提升新一代钓鱼防御措施的可靠性提供了实用防御方案。