Arguments evoke emotions, influencing the effect of the argument itself. Not only the emotional intensity but also the category influence the argument's effects, for instance, the willingness to adapt stances. While binary emotionality has been studied in arguments, there is no work on discrete emotion categories (e.g., "Anger") in such data. To fill this gap, we crowdsource subjective annotations of emotion categories in a German argument corpus and evaluate automatic LLM-based labeling methods. Specifically, we compare three prompting strategies (zero-shot, one-shot, chain-of-thought) on three large instruction-tuned language models (Falcon-7b-instruct, Llama-3.1-8B-instruct, GPT-4o-mini). We further vary the definition of the output space to be binary (is there emotionality in the argument?), closed-domain (which emotion from a given label set is in the argument?), or open-domain (which emotion is in the argument?). We find that emotion categories enhance the prediction of emotionality in arguments, emphasizing the need for discrete emotion annotations in arguments. Across all prompt settings and models, automatic predictions show a high recall but low precision for predicting anger and fear, indicating a strong bias toward negative emotions.
翻译:论据会引发情绪,从而影响论据本身的效果。不仅情绪的强度,其类别也会影响论据的效果,例如改变立场的意愿。虽然论据中的二元情绪性已有研究,但目前尚无针对此类数据中离散情绪类别(如“愤怒”)的工作。为填补这一空白,我们通过众包方式对一个德语论据语料库中的情绪类别进行了主观标注,并评估了基于大语言模型的自动标注方法。具体而言,我们在三个大型指令调优语言模型(Falcon-7b-instruct, Llama-3.1-8B-instruct, GPT-4o-mini)上比较了三种提示策略(零样本、单样本、思维链)。我们进一步将输出空间的定义变化为二元(论据中是否存在情绪性?)、封闭域(论据中的情绪属于给定标签集中的哪一种?)或开放域(论据中包含何种情绪?)。我们发现,情绪类别能增强对论据中情绪性的预测,这凸显了在论据中进行离散情绪标注的必要性。在所有提示设置和模型中,自动预测对于愤怒和恐惧的识别均表现出高召回率但低精确率,这表明模型对负面情绪存在强烈偏向。