Available corpora for Argument Mining differ along several axes, and one of the key differences is the presence (or absence) of discourse markers to signal argumentative content. Exploring effective ways to use discourse markers has received wide attention in various discourse parsing tasks, from which it is well-known that discourse markers are strong indicators of discourse relations. To improve the robustness of Argument Mining systems across different genres, we propose to automatically augment a given text with discourse markers such that all relations are explicitly signaled. Our analysis unveils that popular language models taken out-of-the-box fail on this task; however, when fine-tuned on a new heterogeneous dataset that we construct (including synthetic and real examples), they perform considerably better. We demonstrate the impact of our approach on an Argument Mining downstream task, evaluated on different corpora, showing that language models can be trained to automatically fill in discourse markers across different corpora, improving the performance of a downstream model in some, but not all, cases. Our proposed approach can further be employed as an assistive tool for better discourse understanding.
翻译:用于论元挖掘的语料库在多个维度上存在差异,其中关键差异之一在于是否存在话语标记词以标示论元内容。探索话语标记词的有效使用方式已在各类话语解析任务中受到广泛关注,众所周知,话语标记词是话语关系的有力指示符。为提升论元挖掘系统在不同体裁中的鲁棒性,我们提出自动为给定文本增补话语标记词,使所有关系均得到显式标示。分析表明,现成的流行语言模型在此任务中表现不佳;然而,在我们构建的(包含合成与真实样本的)新型异质数据集上进行微调后,其性能显著提升。我们通过基于不同语料库评估的论元挖掘下游任务展示了该方法的效果,证明语言模型可被训练为跨语料库自动填补话语标记词,并在某些(并非所有)情况下改善下游模型的性能。所提方法还可作为辅助工具用于增强话语理解。