Previous studies have introduced a weakly-supervised paradigm for solving math word problems requiring only the answer value annotation. While these methods search for correct value equation candidates as pseudo labels, they search among a narrow sub-space of the enormous equation space. To address this problem, we propose a novel search algorithm with combinatorial strategy \textbf{ComSearch}, which can compress the search space by excluding mathematically equivalent equations. The compression allows the searching algorithm to enumerate all possible equations and obtain high-quality data. We investigate the noise in the pseudo labels that hold wrong mathematical logic, which we refer to as the \textit{false-matching} problem, and propose a ranking model to denoise the pseudo labels. Our approach holds a flexible framework to utilize two existing supervised math word problem solvers to train pseudo labels, and both achieve state-of-the-art performance in the weak supervision task.
翻译:先前研究提出了一种仅需答案标注的弱监督数学应用题求解范式,此类方法通过搜索正确的数值方程候选作为伪标签,但仅在巨大的方程空间中的狭窄子空间内进行搜索。为此,我们提出一种新型组合策略搜索算法\textbf{ComSearch},该算法通过排除数学等价方程来压缩搜索空间。这种压缩使得搜索算法能够枚举所有可能的方程,进而获得高质量数据。我们探究了伪标签中存在的逻辑错误噪声(即\textit{假匹配}问题),并提出一种排序模型对伪标签进行去噪。本方法构建了灵活框架,可利用两种现有有监督数学问题求解器训练伪标签,两者在弱监督任务中均达到当前最优性能。