Transformer language models are neural networks used for a wide variety of tasks concerning natural language, including some that also require logical reasoning. However, a transformer model may easily learn spurious patterns in the data, short-circuiting actual reasoning. In this paper we investigate to what extent transformers can be trained to a) approximate reasoning in propositional logic while b) avoiding known reasoning shortcuts via spurious correlations in the training data. To do so, we use a dataset with known spurious correlation between truth and e.g. the number of rules in the problem. We augment the data with proofs, and train two models: a generative transformer, WP-BART, trained on problems and their whole proofs, and a neuro-symbolic model, SIP-BART, trained on individual proof steps and combining the generative transformer model BART with a symbolic proof checker. We find that SIP-BART succeeds in avoiding reasoning shortcuts, while WP-BART does not. For SIP-BART, we then identify a few remaining reasoning errors, not previously described in the literature, arising from using a pre-trained language model. These are qualitatively analysed to create a taxonomy of four different types of additional pitfalls.
翻译:Transformer语言模型是用于处理包括需要逻辑推理在内的各种自然语言任务的神经网络。然而,Transformer模型可能轻易学习数据中的虚假模式,从而绕过真正的推理过程。本文探究Transformer能在多大程度上被训练以:a) 近似命题逻辑推理,同时b) 避免因训练数据中的虚假关联导致的已知推理捷径。为此,我们采用了一个已知存在真理与规则数量等特征间虚假关联的数据集。我们通过证明过程增强数据,并训练两类模型:生成式Transformer模型WP-BART(基于问题及其完整证明训练),以及神经符号模型SIP-BART(基于单个证明步骤训练,将生成式Transformer模型BART与符号证明检查器结合)。研究发现SIP-BART能有效避免推理捷径,而WP-BART则不能。针对SIP-BART,我们进一步识别出若干尚未在文献中描述的、由使用预训练语言模型导致的剩余推理错误。通过定性分析这些错误,我们构建了包含四种不同陷阱类别的分类体系。