Implicit discourse relation recognition is a challenging task in discourse analysis due to the absence of explicit discourse connectives between spans of text. Recent pre-trained language models have achieved great success on this task. However, there is no fine-grained analysis of the performance of these pre-trained language models for this task. Therefore, the difficulty and possible directions of this task is unclear. In this paper, we deeply analyze the model prediction, attempting to find out the difficulty for the pre-trained language models and the possible directions of this task. In addition to having an in-depth analysis for this task by using pre-trained language models, we semi-manually annotate data to add relatively high-quality data for the relations with few annotated examples in PDTB 3.0. The annotated data significantly help improve implicit discourse relation recognition for level-2 senses.
翻译:隐式篇章关系识别是篇章分析中的一项挑战性任务,其难点在于文本片段之间缺乏显式的篇章连接词。近年来,预训练语言模型在该任务上取得了显著成功。然而,目前尚缺乏对这些预训练语言模型在该任务上表现的细粒度分析,因此该任务的困难所在及可能的研究方向仍不明确。本文通过深入分析模型预测结果,试图揭示预训练语言模型面临的困难以及该任务可能的发展方向。除了利用预训练语言模型对该任务进行深入分析外,我们还通过半人工标注的方式为PDTB 3.0中标注样本较少的篇章关系补充了相对高质量的数据。实验表明,标注数据显著提升了二级语义层面的隐式篇章关系识别性能。