ScratchLens: Lens-Parametric Behavioral Equivalence for Scratch Programs

Two Scratch programs can be syntactically far apart-renamed variables, split scripts, extracted custom blocks, or reordered initialization-and still behave identically; a one-block edit, such as replacing a blocking broadcast with an asynchronous one, can create divergences visible only under specific schedules. Deciding behavioral equivalence is central to automated feedback, grading support, and repair validation, yet tree differencing is too strict and single-run dynamic comparison is unsound for concurrent, random, and timing-dependent behavior. We observe that equivalence for block-based programs is lens-parametric: final state, frame traces, monitors, event causality, and debug traces induce different observation relations. ScratchLens makes this explicit through a taxonomy of causal divergence phenomena and observation lenses. It compiles Scratch projects into a causal IR of typed resources and semantic transactions, canonicalizes renamings, guards, and procedure bodies, quotients same-trigger concurrency with Mazurkiewicz trace normal forms, separates program order from races, and handles residual frontiers through SMT obligations and VM-backed counterexample-guided refinement. Conclusive verdicts carry evidence: equivalence by bijection and trace quotient, difference by a typed witness, and unresolved cases remain unknown. On a VM-witnessed mutation corpus from real Scratch projects, ScratchLens decides all 444 validated pairs and makes 0/158 false-equivalence claims on witnessed-different pairs under strict scoring. Structural, dynamic-only, and LLM baselines fail on the classes predicted by the taxonomy; ablations quantify the contribution of partial-order reduction and lens parametricity; and targeted scenarios expose ambiguous-mutant divergences missed by random testing.

翻译：两个 Scratch 程序可能在语法上差异显著——变量重命名、脚本拆分、自定义块提取或初始化顺序重排——但行为仍然完全相同；而一个块的编辑（例如将阻塞广播替换为异步广播）可能仅在特定调度下显现出差异。判定行为等价性对于自动化反馈、评分支持和修复验证至关重要，然而树差分过于严苛，而对于并发、随机及时序依赖行为，单次运行动态比较并不完备。我们观察到，基于块程序的等价性是透镜参数化的：最终状态、帧轨迹、监视器、事件因果和调试轨迹会引发不同的观察关系。ScratchLens 通过因果发散现象和观察透镜的分类体系明确了这一特性。它将 Scratch 项目编译为类型化资源和语义事务的因果中间表示，对重命名、守卫及过程体进行规范化，通过 Mazurkiewicz 迹范式对同触发并发进行商化，分离程序顺序与竞争，并通过 SMT 义务和基于虚拟机驱动的反例引导精化处理剩余前沿。确定性判决附带证据：通过双射和迹商化证明等价，通过类型化证据证明差异，未决案例则保持未知。在源自真实 Scratch 项目的虚拟机验证突变语料库上，ScratchLens 判定了所有 444 个经验证的配对，并在严格评分下，对验证为不同的配对未做出任何假等价声明（0/158）。结构基线、纯动态基线和大型语言模型基线在分类体系预测的类别上均失败；消融实验量化了偏序约简和透镜参数化的贡献；针对性场景揭示了随机测试未能发现的歧义突变发散。