ScratchLens: Lens-Parametric Behavioral Equivalence for Scratch Programs

Two Scratch programs can be syntactically far apart-renamed variables, split scripts, extracted custom blocks, or reordered initialization-and still behave identically; a one-block edit, such as replacing a blocking broadcast with an asynchronous one, can create divergences visible only under specific schedules. Deciding behavioral equivalence is central to automated feedback, grading support, and repair validation, yet tree differencing is too strict and single-run dynamic comparison is unsound for concurrent, random, and timing-dependent behavior. We observe that equivalence for block-based programs is lens-parametric: final state, frame traces, monitors, event causality, and debug traces induce different observation relations. SPECTRA makes this explicit through a taxonomy of causal divergence phenomena and observation lenses. It compiles Scratch projects into a causal IR of typed resources and semantic transactions, canonicalizes renamings, guards, and procedure bodies, quotients same-trigger concurrency with Mazurkiewicz trace normal forms, separates program order from races, and handles residual frontiers through SMT obligations and VM-backed counterexample-guided refinement. Conclusive verdicts carry evidence: equivalence by bijection and trace quotient, difference by a typed witness, and unresolved cases remain unknown. On a VM-witnessed mutation corpus from real Scratch projects, SPECTRA decides all 444 validated pairs and makes 0/158 false-equivalence claims on witnessed-different pairs under strict scoring. Structural, dynamic-only, and LLM baselines fail on the classes predicted by the taxonomy; ablations quantify the contribution of partial-order reduction and lens parametricity; and targeted scenarios expose ambiguous-mutant divergences missed by random testing.

翻译：两个Scratch程序可能在语法上相距甚远——重命名变量、拆分脚本、提取自定义积木或重排序初始化——但仍表现出相同的行为；而单积木编辑（如将阻塞广播替换为异步广播）可能仅在特定调度下才显现差异。判定行为等价性对于自动化反馈、评分支持和修复验证至关重要，然而树形差异比较过于严格，单次运行动态比较对于并发、随机和时序依赖行为并不可靠。我们发现：积木程序的等价性是透镜参数化的——最终状态、帧迹、监视器、事件因果性和调试迹会引发不同的观察关系。SPECTRA通过因果发散现象与观察透镜的分类体系使这一特性显式化。它将Scratch项目编译为类型化资源与语义事务的因果中间表示，对重命名、守卫和保护体进行规范变换，通过Mazurkiewicz迹范式对相同触发源的并发进行商化处理，分离程序顺序与竞态关系，并通过SMT约束与虚拟机支撑的反例引导精化处理剩余边界。结论性判决附带证据：双射与迹商化证明等价性，类型化证明确认差异性，未决案例则标记为未知。基于真实Scratch项目的虚拟机见证变异语料库中，SPECTRA判定全部444个已验证对，并在严格评分下对158个见证差异对给出0次错误等价断言。结构差异、纯动态方法与LLM基线在分类体系所预测的类别上均告失败；消融实验量化了偏序归约与透镜参数化的贡献；目标场景揭示了随机测试遗漏的模糊变异体分歧。