Recent advancements in LLM-based information-seeking agents have achieved record-breaking performance on established benchmarks. However, these agents remain heavily reliant on search-engine-indexed knowledge, leaving a critical blind spot: Unindexed Information Seeking (UIS). This paper identifies and explores the UIS problem, where vital information is not captured by search engine crawlers, such as overlooked content, dynamic webpages, and embedded files. Despite its significance, UIS remains an underexplored challenge. To address this gap, we introduce UIS-QA, the first dedicated UIS benchmark, comprising 110 expert-annotated QA pairs. Notably, even state-of-the-art agents experience a drastic performance drop on UIS-QA (e.g., from 70.90 on GAIA and 46.70 on BrowseComp-zh to 24.55 on UIS-QA), underscoring the severity of the problem. To mitigate this, we propose UIS-Digger, a novel multi-agent framework that incorporates dual-mode browsing and enables simultaneous webpage searching and file parsing. With a relatively small $\sim$30B-parameter backbone LLM optimized using SFT and RFT training strategies, UIS-Digger sets a strong baseline at 27.27\%, outperforming systems integrating sophisticated LLMs such as O3 and GPT-4.1. This demonstrates the importance of proactive interaction with unindexed sources for effective and comprehensive information-seeking. Our work not only uncovers a fundamental limitation in current agent evaluation paradigms but also provides the first toolkit for advancing UIS research, defining a new and promising direction for robust information-seeking systems. The dataset has been released at: https://huggingface.co/datasets/UIS-Digger/UIS-QA.
翻译:基于大型语言模型的信息检索智能体近期在现有基准测试中取得了突破性性能。然而,这些智能体仍严重依赖搜索引擎索引的知识,存在一个关键盲区:未索引信息检索。本文界定并探讨了UIS问题,即搜索引擎爬虫未能捕获的重要信息,例如被忽视的内容、动态网页及嵌入式文件。尽管该问题至关重要,UIS仍是尚未充分探索的挑战。为填补这一空白,我们提出了首个专用UIS基准测试UIS-QA,包含110个专家标注的问答对。值得注意的是,即使最先进的智能体在UIS-QA上也出现性能急剧下降(例如从GAIA的70.90和BrowseComp-zh的46.70降至UIS-QA的24.55),凸显了问题的严重性。为应对此挑战,我们提出UIS-Digger——一种创新的多智能体框架,融合双模式浏览机制,支持网页搜索与文件解析同步执行。该框架采用经SFT和RFT训练策略优化的约300亿参数骨干LLM,在UIS-QA上取得27.27%的基准成绩,超越了集成O3、GPT-4.1等复杂LLM的系统。这证明了主动与未索引源交互对于实现高效全面信息检索的重要性。我们的工作不仅揭示了当前智能体评估范式的根本局限,更为推进UIS研究提供了首个工具包,为构建鲁棒的信息检索系统定义了全新且前景广阔的研究方向。数据集已发布于:https://huggingface.co/datasets/UIS-Digger/UIS-QA。