During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources. Thus, they must spend substantial effort searching, extracting, and organizing raw data before analysis can begin. We formalize this process as the SODIUM task, where we conceptualize open domains such as the web as latent databases that must be systematically instantiated to support downstream querying. Solving SODIUM requires (1) conducting in-depth and specialized exploration of the open web, which is further strengthened by (2) exploiting structural correlations for systematic information extraction and (3) integrating collected information into coherent, queryable database instances. To quantify the challenges in automating SODIUM, we construct SODIUM-Bench, a benchmark of 105 tasks derived from published academic papers across 6 domains, where systems are tasked with exploring the open web to collect and aggregate data from diverse sources into structured tables. Existing systems struggle with SODIUM tasks: we evaluate 6 advanced AI agents on SODIUM-Bench, with the strongest baseline achieving only 46.5% accuracy. To bridge this gap, we develop SODIUM-Agent, a multi-agent system composed of a web explorer and a cache manager. Powered by our proposed ATP-BFS algorithm and optimized through principled management of cached sources and navigation paths, SODIUM-Agent conducts deep and comprehensive web exploration and performs structurally coherent information extraction. SODIUM-Agent achieves 91.1% accuracy on SODIUM-Bench, outperforming the strongest baseline by approximately 2 times and the weakest by up to 73 times.
翻译:在研究过程中,领域专家经常需要回答分析性问题,而答案通常需要整合来自多种网页来源的数据。因此,在分析开始前,他们必须投入大量精力进行搜索、提取和组织原始数据。我们将这一过程形式化为SODIUM任务,将网页等开放领域概念化为潜在的数据库,这些数据库需要被系统地实例化以支持后续查询。解决SODIUM需要:(1) 对开放网页进行深入且专门的探索,并通过(2) 利用结构相关性进行系统化信息提取,以及(3) 将收集的信息整合为连贯、可查询的数据库实例来进一步强化。为了量化自动化SODIUM的挑战,我们构建了SODIUM-Bench,这是一个包含105个任务的基准测试,这些任务源自已发表的跨6个领域的学术论文,要求系统探索开放网页,从不同来源收集并汇总数据到结构化表格中。现有系统在SODIUM任务上表现不佳:我们在SODIUM-Bench上评估了6个先进的AI智能体,最强基线仅达到46.5%的准确率。为弥补这一差距,我们开发了SODIUM-Agent,这是一个由网页探索器和缓存管理器组成的多智能体系统。通过我们提出的ATP-BFS算法驱动,并通过对缓存来源和导航路径的原则性管理进行优化,SODIUM-Agent能够进行深入且全面的网页探索,并执行结构连贯的信息提取。SODIUM-Agent在SODIUM-Bench上达到了91.1%的准确率,比最强基线高出大约2倍,比最弱基线高出最多73倍。