Neural abstractive summarization has been widely studied and achieved great success with large-scale corpora. However, the considerable cost of annotating data motivates the need for learning strategies under low-resource settings. In this paper, we investigate the problems of learning summarizers with only few examples and propose corresponding methods for improvements. First, typical transfer learning methods are prone to be affected by data properties and learning objectives in the pretext tasks. Therefore, based on pretrained language models, we further present a meta learning framework to transfer few-shot learning processes from source corpora to the target corpus. Second, previous methods learn from training examples without decomposing the content and preference. The generated summaries could therefore be constrained by the preference bias in the training set, especially under low-resource settings. As such, we propose decomposing the contents and preferences during learning through the parameter modulation, which enables control over preferences during inference. Third, given a target application, specifying required preferences could be non-trivial because the preferences may be difficult to derive through observations. Therefore, we propose a novel decoding method to automatically estimate suitable preferences and generate corresponding summary candidates from the few training examples. Extensive experiments demonstrate that our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.
翻译:摘要:基于大规模语料的神经抽象式摘要方法已得到广泛研究并取得了显著成功。然而,数据标注的高昂成本促使我们探索低资源场景下的学习策略。本文针对仅需少量样本即可学习摘要模型的问题展开研究,并提出相应改进方法。首先,传统迁移学习方法易受预训练任务中数据属性与学习目标的影响。为此,我们基于预训练语言模型,进一步提出元学习框架,将少样本学习过程从源语料迁移至目标语料。其次,现有方法在训练样本中未对内容与偏好进行解耦,导致生成的摘要受训练集偏好偏差的约束,尤其在低资源场景下更为明显。针对此问题,我们提出通过参数调制在训练过程中解耦内容与偏好,从而在推理阶段实现对偏好的可控调节。此外,针对特定应用场景,由于偏好可能难以通过观测直接推导,指定所需偏好并非易事。为此,我们提出新型解码方法,可从少量训练样本中自动估计合适的偏好并生成相应候选摘要。大量实验表明,在10样本与100样本设置下,本方法在六个不同语料库上分别实现ROUGE-1/2/L平均提升30.11%/33.95%/27.51%与26.74%/31.14%/24.48%,达到当前最优性能。