MIRROR：一种用于自动评估开放式问题生成的新方法 (MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation)

Automatic question generation is a critical task that involves evaluating question quality by considering factors such as engagement, pedagogical value, and the ability to stimulate critical thinking. These aspects require human-like understanding and judgment, which automated systems currently lack. However, human evaluations are costly and impractical for large-scale samples of generated questions. Therefore, we propose a novel system, MIRROR (Multi-LLM Iterative Review and Response for Optimized Rating), which leverages large language models (LLMs) to automate the evaluation process for questions generated by automated question generation systems. We experimented with several state-of-the-art LLMs, such as GPT-4, Gemini, and Llama2-70b. We observed that the scores of human evaluation metrics, namely relevance, appropriateness, novelty, complexity, and grammaticality, improved when using the feedback-based approach called MIRROR, tending to be closer to the human baseline scores. Furthermore, we observed that Pearson's correlation coefficient between GPT-4 and human experts improved when using our proposed feedback-based approach, MIRROR, compared to direct prompting for evaluation. Error analysis shows that our proposed approach, MIRROR, significantly helps to improve relevance and appropriateness.

翻译：自动问题生成是一项关键任务，其评估问题质量需考量参与度、教学价值以及激发批判性思维的能力等多重因素。这些方面需要类人的理解与判断，而现有自动化系统尚不具备此能力。然而，人工评估成本高昂，且难以适用于大规模生成问题样本。为此，我们提出一种新颖系统MIRROR（基于多LLM迭代评审与反馈的优化评分系统），该系统利用大语言模型（LLMs）对自动问题生成系统所产生的问题进行自动化评估。我们实验了多种前沿LLMs，如GPT-4、Gemini和Llama2-70b。研究发现，当采用名为MIRROR的基于反馈的评估方法时，人工评估指标（包括相关性、适切性、新颖性、复杂性和语法正确性）的得分均有所提升，且更趋近于人工基准分数。此外，与直接提示评估相比，使用我们提出的基于反馈的MIRROR方法时，GPT-4与人类专家评分的皮尔逊相关系数亦得到改善。误差分析表明，我们提出的MIRROR方法能显著提升问题生成的相关性与适切性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日