One of the key factors in language productivity and human cognition is the ability of systematic compositionality, which refers to understanding composed unseen examples of seen primitives. However, recent evidence reveals that the Transformers have difficulty generalizing the composed context based on the seen primitives. To this end, we take the first step to propose a compositionality-aware Transformer called CAT and two novel pre-training tasks to facilitate systematic compositionality. We tentatively provide a successful implementation of a multi-layer CAT on the basis of the especially popular BERT. The experimental results demonstrate that CAT outperforms baselines on compositionality-aware tasks with minimal impact on the effectiveness on standardized language understanding tasks.
翻译:语言生成能力与人类认知的关键要素之一在于系统组合能力,即基于已见基本单元理解组合而成的未见范例。然而最新证据表明,Transformer难以根据已见基本单元泛化组合语境。为此,我们率先提出一种具备组合感知能力的Transformer模型CAT,并设计两项新颖的预训练任务以促进系统组合能力。我们基于广受欢迎的BERT架构,初步成功实现了多层CAT。实验结果表明,CAT在组合感知任务上优于基线模型,且对标准化语言理解任务的有效性影响极小。