This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.
翻译:本文探究了不同类型的基于大语言模型的智能体在科学可视化任务中的表现,用户通过自然语言指令生成可视化工作流。我们比较了三种主要交互范式:具备结构化工具使用的领域专用智能体、计算机使用智能体和通用编码智能体。通过评估15个基准任务中8个代表性智能体,并衡量可视化质量、效率、鲁棒性和计算成本,我们进一步分析了交互模式,包括用于结构化工具使用的代码脚本、模型上下文协议或API调用,以及用于更通用交互的命令行界面和图形用户界面,同时研究了选定智能体中持久性记忆的影响。结果揭示了不同范式和模式间的明确权衡:通用编码智能体获得最高任务成功率,但计算成本高昂;领域专用智能体更高效稳定,但灵活性不足;计算机使用智能体在单步骤上表现良好,但在较长多步骤工作流中表现欠佳,表明长期规划是其首要限制。在基于CLI和GUI的设置中,持久性记忆均能提升重复试验的表现,但其效益取决于底层交互模式与反馈质量。这些发现表明,单一方法并不足以应对所有情况,未来的科学可视化系统应结合结构化工具使用、交互能力与自适应记忆机制,以平衡性能、鲁棒性和灵活性。