We introduce FarExStance, a new dataset for explainable stance detection in Farsi. Each instance in this dataset contains a claim, the stance of an article or social media post towards that claim, and an extractive explanation which provides evidence for the stance label. We compare the performance of a fine-tuned multilingual RoBERTa model to several large language models in zero-shot, few-shot, and parameter-efficient fine-tuned settings on our new dataset. On stance detection, the most accurate models are the fine-tuned RoBERTa model, the LLM Aya-23-8B which has been fine-tuned using parameter-efficient fine-tuning, and few-shot Claude-3.5-Sonnet. Regarding the quality of the explanations, our automatic evaluation metrics indicate that few-shot GPT-4o generates the most coherent explanations, while our human evaluation reveals that the best Overall Explanation Score (OES) belongs to few-shot Claude-3.5-Sonnet. The fine-tuned Aya-32-8B model produced explanations most closely aligned with the reference explanations.
翻译:我们介绍了FarExStance,一个用于波斯语可解释立场检测的新数据集。该数据集中的每个实例包含一个主张、一篇文章或社交媒体帖子对该主张的立场,以及一个为立场标签提供证据的抽取式解释。我们在新数据集上比较了微调的多语言RoBERTa模型与若干大型语言模型在零样本、少样本和参数高效微调设置下的性能。在立场检测任务上,最准确的模型是微调的RoBERTa模型、通过参数高效微调方法微调的LLM Aya-23-8B,以及少样本Claude-3.5-Sonnet。关于解释的质量,我们的自动评估指标表明少样本GPT-4o生成了最连贯的解释,而人工评估显示最佳总体解释分数(OES)属于少样本Claude-3.5-Sonnet。微调的Aya-32-8B模型生成的解释与参考解释最为接近。