Automatic medical question summarization can significantly help the system to understand consumer health questions and retrieve correct answers. The Seq2Seq model based on maximum likelihood estimation (MLE) has been applied in this task, which faces two general problems: the model can not capture well question focus and and the traditional MLE strategy lacks the ability to understand sentence-level semantics. To alleviate these problems, we propose a novel question focus-driven contrastive learning framework (QFCL). Specially, we propose an easy and effective approach to generate hard negative samples based on the question focus, and exploit contrastive learning at both encoder and decoder to obtain better sentence level representations. On three medical benchmark datasets, our proposed model achieves new state-of-the-art results, and obtains a performance gain of 5.33, 12.85 and 3.81 points over the baseline BART model on three datasets respectively. Further human judgement and detailed analysis prove that our QFCL model learns better sentence representations with the ability to distinguish different sentence meanings, and generates high-quality summaries by capturing question focus.
翻译:自动医学问题摘要能够显著帮助系统理解消费者健康问题并检索正确答案。基于最大似然估计(MLE)的序列到序列(Seq2Seq)模型已被应用于此任务,但面临两个普遍问题:模型无法很好地捕捉问题焦点,且传统MLE策略缺乏理解句子级语义的能力。为缓解这些问题,我们提出了一种新颖的问题焦点驱动对比学习框架(QFCL)。具体而言,我们提出了一种简单有效的方法,基于问题焦点生成硬负样本,并在编码器和解码器中利用对比学习以获得更好的句子级表示。在三个医学基准数据集上,我们提出的模型达到了新的最先进结果,分别在三个数据集上比基线BART模型取得了5.33、12.85和3.81个点的性能提升。进一步的人工评估和详细分析证明,我们的QFCL模型通过学习具有区分不同句子含义能力的更好句子表示,并通过捕捉问题焦点生成高质量摘要。