We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.
翻译:我们提出了ReCAT,一种递归组合增强的Transformer,能够在学习和推理过程中不依赖黄金句法树,显式地对原始文本的层级句法结构进行建模。现有相关研究将数据限制为遵循层级树结构,因此缺乏跨片段(span)的交互。为解决这一问题,我们提出了一种新颖的上下文内外(CIO)层,该层通过自底向上和自顶向下的传递来学习片段的上下文表示:自底向上传递通过组合低层片段形成高层片段的表示,而自顶向下传递则结合片段内部和外部的信息。通过在Transformer的嵌入层和注意力层之间堆叠多个CIO层,ReCAT模型能够进行深层片段内和深层片段间交互,从而生成与其他片段充分上下文化的多粒度表示。此外,CIO层可与Transformer联合预训练,使ReCAT同时具备可扩展性、强性能和可解释性。我们在多种句子级和片段级任务上进行了实验。评估结果表明,在自然语言推理任务中,ReCAT在所有片段级任务上显著优于标准Transformer模型以及将递归网络与Transformer结合的基线模型。更有趣的是,ReCAT诱导的层级结构与人工标注的句法树表现出高度一致性,这体现了CIO层带来的良好可解释性。