LLM-based web agents have become increasingly popular for their utility in daily life and work. However, they exhibit critical vulnerabilities when processing malicious URLs: accepting a disguised malicious URL enables subsequent access to unsafe webpages, which can cause severe damage to service providers and users. Despite this risk, no benchmark currently targets this emerging threat. To address this gap, we propose MalURLBench, the first benchmark for evaluating LLMs' vulnerabilities to malicious URLs. MalURLBench contains 61,845 attack instances spanning 10 real-world scenarios and 7 categories of real malicious websites. Experiments with 12 popular LLMs reveal that existing models struggle to detect elaborately disguised malicious URLs. We further identify and analyze key factors that impact attack success rates and propose URLGuard, a lightweight defense module. We believe this work will provide a foundational resource for advancing the security of web agents. Our code is available at https://github.com/JiangYingEr/MalURLBench.
翻译:基于大语言模型的网页智能体因其在日常工作和生活中的实用性而日益普及。然而,这些智能体在处理恶意URL时表现出严重的脆弱性:接受一个经过伪装的恶意URL将允许后续访问不安全的网页,这可能对服务提供商和用户造成严重损害。尽管存在这种风险,目前尚无针对这一新兴威胁的基准测试。为填补这一空白,我们提出了MalURLBench,这是首个用于评估大语言模型对恶意URL脆弱性的基准。MalURLBench包含61,845个攻击实例,涵盖10个真实世界场景和7类真实的恶意网站。对12个流行大语言模型的实验表明,现有模型难以检测精心伪装的恶意URL。我们进一步识别并分析了影响攻击成功率的关键因素,并提出了URLGuard——一个轻量级的防御模块。我们相信这项工作将为推进网页智能体的安全性提供基础性资源。我们的代码可在 https://github.com/JiangYingEr/MalURLBench 获取。