The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of Argument Mining dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.
翻译:论据挖掘任务,即从大规模文档源中针对特定主题提取并分类论据成分,对机器学习模型和人类而言本质上都是一项困难的任务,因为大型论据挖掘数据集较为稀缺,且论据成分的识别需要专业知识。若该任务还涉及对检索论据的立场检测,则难度会进一步增加。在本研究中,我们探究了在少样本和零样本设置下论据挖掘数据集构成的影响。我们的研究结果表明,尽管微调对于获得可接受的模型性能是必需的,但使用精心构建的训练样本并将训练样本规模减少近90%,仍能达到最大性能的95%。这一发现在三个不同数据集上的三项论据挖掘任务中均保持一致。我们还发布了一个新的数据集以供未来基准测试使用。