Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging scenario, involving a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration. If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios. We annotate a large set of pro and anti-immigration examples, and compare the performance of multiple language models as supervised learners. We also probe the usability of ChatGPT as an instructable zero-shot classifier for the same task. Supervised achieves acceptable performance, and ChatGPT yields similar accuracy. This is promising as a potentially simpler and cheaper alternative for text classification tasks, including in lower-resource languages. We further use the best-performing model to investigate diachronic trends over seven years in two corpora of Estonian mainstream and right-wing populist news sources, demonstrating the applicability of the approach for news analytics and media monitoring settings, and discuss correspondences between stance changes and real-world events.
翻译:自动立场检测及相关机器学习方法可为媒体监测与学术研究提供有价值的见解。这类方法大多需要标注训练数据集,这限制了其在缺乏此类数据集的语言中的适用性。本文探讨了大语言模型在复杂场景下(涉及形态复杂的低资源语言与社会文化敏感的移民议题)进行自动立场检测的可行性。若该方法在此案例中有效,则可预期其在要求较低的场景中表现更优或相当。我们标注了大量支持与反对移民的样本,并以监督学习方式比较了多种语言模型的性能。同时,我们评估了ChatGPT作为可指导零样本分类器执行同一任务的适用性。监督学习模型取得了可接受的性能,而ChatGPT也展现出相似的准确率。这为文本分类任务(包括低资源语言场景)提供了一种可能更简单、成本更低的替代方案。我们进一步利用最优模型,分析了爱沙尼亚主流媒体与民粹主义新闻源两个语料库中七年间的历时趋势,论证了该方法在新闻分析与媒体监测场景中的适用性,并讨论了立场变化与现实事件之间的对应关系。