In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specific knowledge. However, current artificial intelligence (AI) systems struggle to consistently demonstrate such capabilities. The complexity and variability of scientific tables and figures, combined with heterogeneous structures and long-context requirements, pose fundamental obstacles to scientific table \& figure analysis. To quantify these challenges, we introduce AnaBench, a large-scale benchmark featuring $63,178$ instances from nine scientific domains, systematically categorized along seven complexity dimensions. To tackle these challenges, we propose Anagent, a multi-agent framework for enhanced scientific table \& figure analysis through four specialized agents: Planner decomposes tasks into actionable subtasks, Expert retrieves task-specific information through targeted tool execution, Solver synthesizes information to generate coherent analysis, and Critic performs iterative refinement through five-dimensional quality assessment. We further develop modular training strategies that leverage supervised finetuning and specialized reinforcement learning to optimize individual capabilities while maintaining effective collaboration. Comprehensive evaluation across 170 subdomains demonstrates that Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, while revealing that task-oriented reasoning and context-aware problem-solving are essential for high-quality scientific table \& figure analysis. Our project page: https://xhguo7.github.io/Anagent/.
翻译:在科学研究中,分析工作需准确解读复杂的多模态知识,整合来自不同来源的证据,并基于领域特定知识进行推理。然而,当前的人工智能系统难以持续展现此类能力。科学表格与图形的复杂性和多变性,加之异构结构与长上下文需求,对科学图表分析构成了根本性障碍。为量化这些挑战,我们提出了AnaBench——一个包含九个科学领域共$63,178$个实例的大规模基准测试集,并沿七个复杂度维度进行了系统分类。为应对这些挑战,我们提出了Anagent:一个通过四个专用智能体增强科学图表分析的多智能体框架:规划器将任务分解为可执行的子任务,专家通过定向工具执行检索任务特定信息,求解器综合信息以生成连贯分析,评审器通过五维质量评估进行迭代优化。我们进一步开发了模块化训练策略,利用监督微调与专用强化学习来优化个体能力,同时保持有效协作。在170个子领域的综合评估表明,Anagent实现了显著提升——在无需训练的场景下达到$\uparrow 13.43\%$,经微调后达到$\uparrow 42.12\%$,同时揭示了面向任务的推理与情境感知的问题解决能力对高质量科学图表分析至关重要。项目页面:https://xhguo7.github.io/Anagent/。