We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.
翻译:摘要:我们提出ReCAT,一种递归组合增强的Transformer,能够在训练和推理过程中无需依赖标准句法树,显式建模原始文本的层级句法结构。现有研究将数据限制为遵循层级树结构,导致缺乏跨片段交互。为解决该问题,我们提出新型上下文内外(CIO)层,通过自底向上和自顶向下传递学习片段的上下文表示:自底向上传递通过组合低层片段形成高层片段表示,而自顶向下传递则结合片段内部与外部的信息。通过在Transformer的嵌入层与注意力层之间堆叠多个CIO层,ReCAT模型能够实现深度片段内与深度片段间交互,从而生成与其他片段充分上下文化的多粒度表示。此外,CIO层可与Transformer联合预训练,使ReCAT同时具备扩展能力、强性能与可解释性。我们在多种句子级与片段级任务上进行实验,评估结果表明:ReCAT在所有片段级任务上显著优于标准Transformer模型,且在自然语言推理任务中优于结合递归网络与Transformer的基线方法。更有趣的是,ReCAT诱导出的层级结构与人工标注的句法树高度一致,展现了CIO层带来的良好可解释性。