Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create $\texttt{MMA}$, a large, flexible, multilingual, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on $\texttt{MMA}$ produce $16-18\%$ of statements acceptable with minimal corrections on the $\texttt{miniF2F}$ and $\texttt{ProofNet}$ benchmarks, up from $0\%$ with the base model. We demonstrate that fine-tuning on multilingual formal data results in more capable autoformalization models even when deployed on monolingual tasks.
翻译:自动形式化是将自然语言材料转换为机器可验证的形式表示的任务。自动形式化研究的进展受到缺乏表达相同本质的非形式化-形式化配对大规模数据集的阻碍。现有方法通常通过手动整理小型语料库或使用大型语言模型进行少样本学习来规避这一挑战,但这些方法受限于数据稀缺和形式语言习得困难。在本研究中,我们通过使用语言模型进行反向翻译(即从形式化的数学陈述转换为对应的非形式化陈述),创建了$\texttt{MMA}$——一个大型、灵活、多语言、多领域的非形式化-形式化配对数据集。实验表明,在$\texttt{MMA}$上微调的语言模型在$\texttt{miniF2F}$和$\texttt{ProofNet}$基准测试中,生成仅需最小修正的可接受陈述比例从基础模型的$0\%$提升至$16-18\%$。我们证明,即使在单语言任务中部署,对多语言形式数据进行微调也能产生能力更强的自动形式化模型。