We introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release \textbf{OpInstruct-HSx}, a synthetic dataset of $\approx$28k validated Haskell programs. Empirical experiments show that our evaluator transfers effectively to downstream tasks, achieving up to 13.3pp accuracy gain on EquiBench and consistent gains on PySecDB. Ablation studies on the SEQ-SINQ regimes indicate that while inequivalence supervision provides data volume, equivalence proofs are uniquely responsible for the model's reasoning capabilities. The entire training pipeline and dataset are publicly released on GitHub and Hugging Face respectively.
翻译:我们提出了一种基于Haskell语言的语义等价自对弈框架,利用形式化验证指导生成器与评估器之间的对抗训练。该框架借助Liquid Haskell证明进行等价性验证,并通过基于执行的对抗样本处理不等价情况,采用难度感知的课程学习策略进行组织。为支撑该框架,我们发布了包含约2.8万个经过验证的Haskell程序的合成数据集\textbf{OpInstruct-HSx}。实验表明,我们的评估器在下游任务中实现了有效迁移,在EquiBench上获得高达13.3个百分点的准确率提升,并在PySecDB上取得一致性能增益。针对SEQ-SINQ机制的消融研究显示,虽然不等价监督提供了数据规模优势,但等价性证明才是模型推理能力的唯一来源。整个训练流程及数据集已分别在GitHub和Hugging Face平台公开发布。