Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be difficult to verify. Democratizing measurement capabilities thus demands automating both workflow generation and verification against methodological standards established through decades of research. We present Airavat, the first agentic framework for Internet measurement workflow generation with systematic verification and validation. Airavat coordinates a set of agents mirroring expert reasoning: three agents handle problem decomposition, solution design, and code implementation, with assistance from a registry of existing tools. Two specialized engines ensure methodological correctness: a Verification Engine evaluates workflows against a knowledge graph encoding five decades of measurement research, while a Validation Engine identifies appropriate validation techniques grounded in established methodologies. Through four Internet measurement case studies, we demonstrate that Airavat (i) generates workflows matching expert-level solutions, (ii) makes sound architectural decisions, (iii) addresses novel problems without ground truth, and (iv) identifies methodological flaws missed by standard execution-based testing.
翻译:互联网测量面临双重挑战:复杂的分析需要专家级的工具编排,然而即使是语法正确的实现也可能存在方法论缺陷且难以验证。因此,普及测量能力需要自动化工作流生成,并依据数十年研究建立的方法论标准进行验证。我们提出了Airavat,这是首个具备系统性验证与确认功能的、用于生成互联网测量工作流的智能体框架。Airavat协调一组模拟专家推理的智能体:三个智能体在现有工具注册库的辅助下,分别处理问题分解、方案设计和代码实现。两个专用引擎确保方法论正确性:验证引擎依据编码了五十年测量研究的知识图谱评估工作流;确认引擎则基于既定方法论识别合适的确认技术。通过四个互联网测量案例研究,我们证明Airavat能够(i)生成与专家级解决方案匹配的工作流,(ii)做出合理的架构决策,(iii)处理缺乏基准真值的新问题,以及(iv)发现标准基于执行的测试所遗漏的方法论缺陷。