The task of Argument Mining, that is extracting argumentative sentences for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argumentative sentences requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. Given the cost and complexity of creating suitably large Argument Mining datasets, we ask whether it is necessary for acceptable performance to have datasets growing in size. Our findings show that, when using carefully composed training samples and a model pretrained on related tasks, we can reach 95% of the maximum performance while reducing the training sample size by at least 85%. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.
翻译:论据挖掘任务——即从大规模文档源中提取特定主题的论证性语句——对机器学习模型和人类而言均是一项固有难题,因为大型论据挖掘数据集稀缺,且识别论证性语句需要专业知识。若任务还涉及所检索论据的立场检测,其难度将进一步增加。考虑到创建规模适当的大型论据挖掘数据集的成本与复杂性,我们追问:是否必须通过扩大数据集规模才能获得可接受的性能?研究发现,当使用精心编排的训练样本并采用预训练相关任务的模型时,我们可在将训练样本规模缩减至少85%的情况下达到最高性能的95%。这一收益在三个不同数据集的三个论据挖掘任务中保持一致。此外,我们发布了一个用于未来基准测试的新数据集。