Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets natural language into structured intents (semantic layer); validated generators produce reproducible workflow DAGs (deterministic layer); and domain experts author ``Skills'': markdown documents encoding vocabulary mappings, parameter constraints, and optimization strategies (knowledge layer). This decomposition confines LLM non-determinism to intent extraction: identical intents always yield identical workflows. We implement and evaluate the architecture on the 1000 Genomes population genetics workflow and Hyperflow WMS running on Kubernetes. In an ablation study on 150 queries, Skills raise full-match intent accuracy from 44% to 83%; skill-driven deferred workflow generation reduces data transfer by 92\%; and the end-to-end pipeline completes queries on Kubernetes with LLM overhead below 15 seconds and cost under $0.001 per query.
翻译:科学工作流系统实现了执行层面的自动化——包括调度、容错、资源管理——但并未覆盖其之前的语义转换环节。科学家仍需要手动将研究问题转化为工作流规范,这一任务既需要领域知识又需要基础设施专业知识。我们提出一种弥合这一鸿沟的智能体架构,该架构包含三个层次:大语言模型将自然语言解释为结构化意图(语义层);经过验证的生成器产生可复现的工作流有向无环图(确定性层);领域专家编写"技能"文档,即编码词汇映射、参数约束与优化策略的Markdown文档(知识层)。这种分解将大语言模型的不确定性局限在意图提取环节:相同的意图始终生成相同的工作流。我们在1000基因组群体遗传学工作流及运行在Kubernetes上的Hyperflow工作流管理系统上实现并评估了该架构。在150次查询的消融研究中,技能将完全匹配意图的准确率从44%提升至83%;基于技能的延迟工作流生成将数据传输量减少了92%;端到端流水线在Kubernetes上完成查询的LLM开销低于15秒,每次查询成本低于0.001美元。