Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available in https://github.com/menik1126/math_ESIB.
翻译:数学应用题(MWP)旨在自动求解文本形式的数学问题。先前的研究倾向于设计复杂模型以捕捉原始文本中的额外信息,从而使模型获得更全面的特征。本文则转向相反方向,研究如何为数学应用题丢弃包含虚假相关性的冗余特征。为此,我们基于变分信息瓶颈设计了一种用于数学应用题的表达式语法信息瓶颈方法(称为ESIB),该方法在提取表达式语法树本质特征的同时,过滤掉包含语法无关特征的潜在特定冗余。ESIB的核心思想是通过相互学习,鼓励多个模型针对同一问题的不同问题表示预测出相同的表达式语法树,从而捕捉表达式语法树的一致性信息并丢弃潜在特定冗余。为提升模型的泛化能力并生成更多样化的表达式,我们设计了一种自蒸馏损失,以鼓励模型更多地依赖潜在空间中的表达式语法信息。在两个大规模基准测试上的实验结果表明,我们的模型不仅取得了最先进的结果,而且能生成更多样化的解。代码可在 https://github.com/menik1126/math_ESIB 获取。