Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines.
翻译:科学可视化流水线通过严格的执行依赖关系编码领域特定的过程知识,这使得其构建过程对缺失阶段、操作符使用错误或顺序不当等问题极为敏感。因此,从自然语言描述生成可执行的科学可视化流水线对于大型语言模型而言仍然具有挑战性,特别是在基于网络的可视化创作环境中,此类创作依赖于显式的代码级流水线组装。在本工作中,我们研究了基于LLM的科学可视化流水线生成的可靠性,并以vtk.js作为代表性的基于网络的可视化库进行重点分析。我们提出了一种结构感知的检索增强生成工作流,该工作流提供与流水线对齐的vtk.js代码示例作为上下文指导,以支持正确的模块选择、参数配置和执行顺序。我们在多个多阶段科学可视化任务和不同LLM上对所提出的工作流进行了评估,从流水线可执行性和人工修正工作量两个维度衡量其可靠性。为此,我们引入了修正成本作为衡量获得有效流水线所需人工干预量的指标。我们的结果表明,结构化的、领域特定的上下文能显著提高流水线的可执行性并降低修正成本。此外,我们提供了一个交互式分析界面,以支持人在回路的检查和对生成的可视化流水线进行系统性评估。