ATWL: A Formal Language for Representing, Comparing, and Reusing Visual Analytics Workflows

Visual analytics (VA) workflows are inherently complex, involving data transformation, feature engineering, visual representation, and human interpretation. They are typically described in unstructured prose, hindering systematic comparison, reuse of proven strategies, and training of novices. We present Artifact-Transform Workflow Language (ATWL), a domain-agnostic, declarative language that formally represents VA workflows by capturing their structure and underlying analytical intent. ATWL is built upon a modular ontology of eight artifact types (entities, features, arrangements, visualisations, patterns, models, knowledge, specifications) and transforms characterised by standardised intents (e.g., define-unit, characterise, contextualise, abstract). To show that formalisation effort need not impede adoption, we extract workflows from research papers through supervised interaction with LLM agents, reducing the human role to review and refinement. Using this process, we constructed a library of seventeen ATWL workflows from published VA papers. Cross-workflow analysis reveals structural regularities -- a recurrent meta-structure, recurring motifs, reusable building blocks, diverse iterative strategies, and cross-domain equivalences -- that remain invisible in prose. We further evaluate practical utility through a controlled experiment in which the same LLM addressed two analytical problems with the library supplied either as original papers or as ATWL representations. Both forms enabled useful recommendations, but the formal representation systematically added explicit iteration structure, typed data flow, fragment-level adaptation provenance, and compactness supporting scaling beyond what prose libraries can fit in an LLM's context. ATWL enables a transition from narrative descriptions to formally represented, comparable, and reusable analytical knowledge.

翻译：可视分析工作流本质上是复杂的，涉及数据变换、特征工程、可视化表达和人工解读等环节。它们通常以非结构化的自然语言描述，阻碍了系统性的比较、已有策略的复用和初学者的培训。我们提出工匠—变换工作流语言（ATWL），这是一种与领域无关的声明式语言，通过捕捉工作流的结构及其背后的分析意图来形式化表示可视分析工作流。ATWL构建于一个模块化的本体之上，该本体包含八种工件类型（实体、特征、布局、可视化、模式、模型、知识与规格说明）和由标准化意图（例如定义单元、描述特征、语境化、抽象化）刻画的变换。为表明形式化工作不必以阻碍采纳为代价，我们通过与LLM智能体的受监督交互，从研究论文中提取工作流，从而将人的角色缩减为审核与精修。利用这一过程，我们从已发表的可视分析论文中构建了一个包含十七个ATWL工作流的库。跨工作流分析揭示了结构性规律——一种反复出现的元结构、重复出现的主题、可复用的构建模块、多样化的迭代策略以及跨领域等价性——这些在自然语言描述中是隐而不见的。我们进一步通过一项受控实验评估其实用价值：在该实验中，同一LLM处理两个分析问题，提供的库要么是原始论文，要么是ATWL表示。两种形式均能提供有益的推荐，但形式化表示系统性地添加了显式的迭代结构、类型化的数据流、片段级适配溯源，以及能够支持超越自然语言库在LLM上下文中容纳能力的紧凑性。ATWL使得从叙述性描述向形式化表示、可比较且可复用的分析知识实现了转变。