Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available.
翻译:数学应用题旨在自动解答以文本形式呈现的数学问题。现有研究倾向于设计复杂模型以捕获原始文本中的附加信息,从而使模型获得更全面的特征。本文反其道而行之,着力研究如何丢弃包含虚假相关性的冗余特征以提升数学应用题解答性能。为此,我们基于变分信息瓶颈理论,提出面向数学应用题的表达式语法信息瓶颈方法(ESIB),该方法能在过滤包含语法不相关特征的潜在特定冗余信息的同时,提取表达式语法树的本质特征。ESIB的核心思想是通过互学习机制,促使多个模型针对同一问题的不同表征形式预测相同的表达式语法树,从而捕获表达式语法树的一致性信息并摒弃潜在特定冗余。为提升模型泛化能力并生成更多样化的表达式,我们设计了一种自蒸馏损失函数,促使模型在潜空间中更依赖于表达式语法信息。在两大基准数据集上的实验结果表明,我们的模型不仅达到了最先进水平,还能生成更多样化的解题方案。相关代码已开源。