Transformer architectures have achieved great success in solving natural language tasks, which learn strong language representations from large-scale unlabeled texts. In this paper, we seek to go further beyond and explore a new logical inductive bias for better language representation learning. Logic reasoning is known as a formal methodology to reach answers from given knowledge and facts. Inspired by such a view, we develop a novel neural architecture named FOLNet (First-Order Logic Network), to encode this new inductive bias. We construct a set of neural logic operators as learnable Horn clauses, which are further forward-chained into a fully differentiable neural architecture (FOLNet). Interestingly, we find that the self-attention module in transformers can be composed by two of our neural logic operators, which probably explains their strong reasoning performance. Our proposed FOLNet has the same input and output interfaces as other pretrained models and thus could be pretrained/finetuned by using similar losses. It also allows FOLNet to be used in a plug-and-play manner when replacing other pretrained models. With our logical inductive bias, the same set of ``logic deduction skills'' learned through pretraining are expected to be equally capable of solving diverse downstream tasks. For this reason, FOLNet learns language representations that have much stronger transfer capabilities. Experimental results on several language understanding tasks show that our pretrained FOLNet model outperforms the existing strong transformer-based approaches.
翻译:摘要:Transformer架构在处理自然语言任务中取得了巨大成功,通过大规模无标注文本学习到强大的语言表示。本文旨在进一步探索一种新的逻辑归纳偏好以优化语言表示学习。逻辑推理作为一种形式化方法,能够从给定知识和事实中推导出答案。受此启发,我们提出了一种名为FOLNet(一阶逻辑网络)的新型神经架构,用于编码这一归纳偏好。我们构建了一组神经逻辑算子作为可学习的霍恩子句,并通过前向链接形成完全可微的神经架构(FOLNet)。有趣的是,我们发现Transformer中的自注意力模块可由两个神经逻辑算子组合而成,这或许解释了其强大的推理能力。所提出的FOLNet与现有预训练模型具有相同的输入输出接口,因此可通过相似损失函数进行预训练与微调,支持以即插即用方式替换其他预训练模型。凭借逻辑归纳偏好,通过预训练习得的同一套"逻辑推理技能"预期能够同等有效地解决各类下游任务。基于此,FOLNet学到的语言表示具有更强的迁移能力。在多项语言理解任务上的实验结果表明,我们预训练的FOLNet模型优于现有的强Transformer基线方法。