Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process. In this paper, we propose three pretraining tasks that operate at both the whole program and sub-program level: Variable Integrity Ranking, which guides the model to focus on useful variables; Variable Operator Prediction, which decomposes the supervision into fine-grained single operator prediction; and Variable Keyphrase Masking, which encourages the model to identify key evidence that sub-programs are derived from. Experimental results demonstrate the effectiveness of our proposed methods, surpassing transformer-based model baselines.
翻译:表格与文本混合段落(如财务报告)中的数值推理面临重大挑战,具有广泛的应用潜力。模型输入中的噪声和无关变量一直制约其性能表现。此外,针对完整解决方案程序的粗粒度监督阻碍了模型对底层数值推理过程的学习能力。本文提出三种预训练任务,分别在完整程序与子程序层级运行:变量完整性排序(引导模型聚焦有效变量)、变量操作符预测(将监督信号分解为细粒度单操作符预测)、变量关键短语掩码(鼓励模型识别子程序推导来源的关键证据)。实验结果表明,所提方法超越基于Transformer的模型基线,有效验证了其性能优势。