We audited counter-arguments generated by large language models (LLMs), focusing on their ability to generate evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the `No Style' counter-arguments proved most persuasive on average. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for fine-tuning LLMs.
翻译:我们对大型语言模型生成的反论点进行了审计,重点关注其基于Reddit平台“改变我的观点”数据集中的帖子、生成兼具证据基础与风格特征的反论点的能力。评估基于Counterfire数据集——该新数据集包含由GPT-3.5 Turbo、Koala及其微调变体、PaLM 2等大型语言模型生成的32,000条反论点,并通过不同提示控制证据使用与论证风格。GPT-3.5 Turbo在论点质量上表现最优,具备强大的释义能力与风格遵循度,尤其在“互惠性”风格论点中表现突出。然而,平均而言,“无风格”反论点的说服力最强。研究结果表明,证据性与风格要素之间的平衡对于生成令人信服的反论点至关重要。最后,我们讨论了未来研究方向及其对大型语言模型微调的启示。