The growing body of political texts opens up new opportunities for rich insights into political dynamics and ideologies but also increases the workload for manual analysis. Automated speaker attribution, which detects who said what to whom in a speech event and is closely related to semantic role labeling, is an important processing step for computational text analysis. We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021. We fine-tune Llama 2 with QLoRA, an efficient training strategy, and observe our approach to achieve competitive performance in the GermEval 2023 Shared Task On Speaker Attribution in German News Articles and Parliamentary Debates. Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems.
翻译:政治文本的日益增多为了解政治动态和意识形态提供了丰富的新机遇,但也增加了人工分析的工作量。自动发言者归属(可检测演讲事件中谁对谁说了什么,且与语义角色标注密切相关)是计算文本分析的重要处理步骤。我们研究了大型语言模型家族Llama 2在2017-2021年德国议会辩论中自动实现发言者归属的潜力。通过高效训练策略QLoRA对Llama 2进行微调,观察到我们的方法在GermEval 2023德国新闻文章和议会辩论发言者归属共享任务中取得了具有竞争力的性能。我们的结果揭示了大型语言模型在自动化发言者归属方面的能力,为政治话语的计算分析及语义角色标注系统的开发展现了有前景的研究方向。