The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
翻译:关于争议性问题的论证计算处理一直是自然语言处理研究的重要领域,因其在意见形成、决策制定、写作教育等方面的潜在应用价值而备受关注。此类应用中的关键任务之一是对论证质量进行评估——但这也是极具挑战性的工作。在本立场论文中,我们首先简要回顾论证质量研究现状,指出质量概念的多样性及其感知的主观性是制约论证质量评估取得实质性进展的主要障碍。我们认为,指令跟随大语言模型利用跨语境知识的能力能够实现更可靠的评估。这类模型不应仅通过微调追求评估任务排行榜表现,而需要系统性地接受论证理论、应用场景及论证问题解决方法的指令训练。我们探讨了由此产生的现实机遇与伦理问题。