The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Language agents increasingly act as web-enabled systems that search, browse, and synthesize information from diverse sources. However, these sources can include unreliable or adversarial content, and the robustness of agents to adversarial ranking - where misleading information appears prominently in search results - remains poorly understood. Existing benchmarks evaluate functional navigation or static factuality but cannot causally isolate this vulnerability, and current mitigation strategies for retrieval-augmented generation remain largely untested under such conditions. We introduce Synthetic Web Benchmark, a procedurally generated environment comprising thousands of hyperlinked articles with ground-truth labels for credibility and factuality, process-level interaction traces, and contamination filtering to eliminate training-data leakage. By injecting a single high-plausibility misinformation article into a controllable search rank, we measure the causal effect of adversarial exposure in six frontier models. The results reveal catastrophic failures: accuracy collapses despite unlimited access to truthful sources, with minimal search escalation and severe miscalibration. These findings expose fundamental limitations in how current frontier models handle conflicting information, with immediate implications for deployment in high-stakes domains. Our benchmark enables systematic analysis of these failure modes and provides a controlled testbed for evaluating mitigation strategies under adversarial ranking - a gap in current research. This work establishes a reproducible baseline for developing search-robust and epistemically humble agents capable of resisting manipulation in high-stakes domains.

翻译：语言智能体日益成为能够搜索、浏览并综合多源信息的网络化系统。然而，这些信息源可能包含不可靠或对抗性内容，且智能体对对抗性排序（即误导性信息在搜索结果中显著出现）的鲁棒性仍鲜为人知。现有基准主要评估功能性导航或静态事实性，无法因果性地隔离此类漏洞，而当前针对检索增强生成的缓解策略在此类条件下的有效性大多未经检验。我们提出合成网络基准，这是一个程序化生成的环境，包含数千篇带有真实可信度与事实性标签的超链接文章、过程级交互轨迹以及用于消除训练数据泄露的污染过滤机制。通过向可控搜索排序中注入单篇高可信度误导文章，我们测量了六种前沿模型在对抗性暴露下的因果效应。结果显示灾难性故障：尽管可无限制访问真实信息源，模型准确率仍急剧下降，同时伴随极低的搜索升级率与严重的校准错误。这些发现揭示了当前前沿模型在处理冲突信息时的根本性局限，对高风险领域的部署具有直接启示。本基准支持对这些故障模式的系统性分析，并为评估对抗性排序下的缓解策略提供了受控测试平台——这填补了当前研究空白。本研究为开发具有搜索鲁棒性、认知谦逊且能抵御高风险领域操纵的智能体建立了可复现的基线。