Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.

翻译：本文研究了不同类型的大语言模型（LLM）智能体在科学可视化（SciVis）任务中的表现，其中用户通过自然语言指令生成可视化工作流。我们比较了三种主要交互范式，包括基于结构化工具使用的领域专用智能体、计算机使用智能体和通用编程智能体。通过评估15项基准任务中的八个代表性智能体，测量了可视化质量、效率、鲁棒性和计算成本。我们进一步分析了交互模式，包括用于结构化工具使用的代码脚本与模型上下文协议（MCP）或API调用，以及用于更通用交互的命令行界面（CLI）和图形用户界面（GUI），同时研究了持久化内存在选定智能体中的影响。结果表明，不同范式与模式之间存在明确的权衡取舍。通用编程智能体的任务成功率最高，但计算成本高昂；领域专用智能体效率更高且稳定性更强，但灵活性较低。计算机使用智能体在单步任务中表现良好，但在较长多步工作流中表现欠佳，表明长期规划是其首要限制因素。在CLI和GUI基础设置中，持久化内存在重复试验中均能改善性能，但其收益取决于底层交互模式与反馈质量。这些发现表明，单一方法无法满足所有需求，未来的SciVis系统应结合结构化工具使用、交互能力与自适应记忆机制，以平衡性能、鲁棒性和灵活性。