Complex feature extractors are widely employed for text representation building. However, these complex feature extractors make the NLP systems prone to overfitting especially when the downstream training datasets are relatively small, which is the case for several discourse parsing tasks. Thus, we propose an alternative lightweight neural architecture that removes multiple complex feature extractors and only utilizes learnable self-attention modules to indirectly exploit pretrained neural language models, in order to maximally preserve the generalizability of pre-trained language models. Experiments on three common discourse parsing tasks show that powered by recent pretrained language models, the lightweight architecture consisting of only two self-attention layers obtains much better generalizability and robustness. Meanwhile, it achieves comparable or even better system performance with fewer learnable parameters and less processing time.
翻译:复杂特征提取器被广泛用于文本表示构建。然而,这些复杂特征提取器使得自然语言处理系统容易过拟合,尤其是在下游训练数据集相对较小时——这正是多个篇章分析任务面临的常见情况。为此,我们提出一种替代的轻量级神经架构,该架构移除多个复杂特征提取器,仅利用可学习的自注意力模块间接利用预训练神经语言模型,以最大程度保留预训练语言模型的泛化能力。在三个常见篇章分析任务上的实验表明,借助当前预训练语言模型,仅由两个自注意力层组成的轻量级架构获得了显著更优的泛化能力和鲁棒性。同时,它以更少的可学习参数和更短的处理时间达到了相当甚至更好的系统性能。