Data analytics using GUI-based workflows is an iterative process in which an analyst makes many iterations of changes to refine the workflow, generating a different version at each iteration. In many cases, the result of executing a workflow version is equivalent to a result of a prior executed version. Identifying such equivalence between the execution results of different workflow versions is important for optimizing the performance of a workflow by reusing results from a previous run. The size of the workflows and the complexity of their operators often make existing equivalence verifiers (EVs) not able to solve the problem. In this paper, we present "Veer," which leverages the fact that two workflow versions can be very similar except for a few changes. The solution divides the workflow version pair into small parts, called windows, and verifies the equivalence within each window by using an existing EV as a black box. We develop solutions to efficiently generate windows and verify the equivalence within each window. Our thorough experiments on real workflows show that Veer is able to not only verify the equivalence of workflows that cannot be supported by existing EVs but also do the verification efficiently.
翻译:基于图形用户界面的工作流数据分析是一个迭代过程,分析师通过多次迭代修改来优化工作流,每次迭代生成不同版本。在许多情况下,执行某个工作流版本的结果与先前执行版本的结果是等价的。识别不同工作流版本执行结果之间的这种等价性,对于通过复用先前运行的结果来优化工作流性能至关重要。工作流的规模及其运算符的复杂性往往使得现有的等价性验证器(EVs)无法解决这一问题。本文提出"Veer",其核心思路是利用两个工作流版本除少数改动外高度相似的特点。该方法将工作流版本对划分为称为"窗口"的小块,并通过将现有EV作为黑盒工具来验证每个窗口内的等价性。我们开发了高效生成窗口并验证其内部等价性的解决方案。在实际工作流上的充分实验表明,Veer不仅能验证现有EV无法支持的工作流等价性,还能高效完成验证。