Annotation of political discourse is resource-intensive, but recent developments in NLP promise to automate complex annotation tasks. Fine-tuned transformer-based models outperform human annotators in some annotation tasks, but they require large manually annotated training datasets. In our contribution, we explore to which degree a manually annotated dataset can be automatically replicated with today's NLP methods, using unsupervised machine learning and zero- and few-shot learning.
翻译:政治话语标注是一项资源密集型任务,但自然语言处理领域的最新进展为实现复杂标注任务的自动化提供了可能。基于Transformer的微调模型在某些标注任务中已超越人类标注员,但这些模型需要大量人工标注的训练数据集。本研究探讨了如何利用无监督机器学习、零样本学习和小样本学习等当前自然语言处理方法,在多大程度上能够自动复现人工标注的数据集。