This paper looks at the ability of large language models to participate in educational guided reading. We specifically, evaluate their ability to generate meaningful questions from the input text, generate diverse questions both in terms of content coverage and difficulty of the questions and evaluate their ability to recommend part of the text that a student should re-read based on the student's responses to the questions. Based on our evaluation of ChatGPT and Bard, we report that, 1) Large language models are able to generate high quality meaningful questions that have high correlation with the input text, 2) They generate diverse question that cover most topics in the input text even though this ability is significantly degraded as the input text increases, 3)The large language models are able to generate both low and high cognitive questions even though they are significantly biased toward low cognitive question, 4) They are able to effectively summarize responses and extract a portion of text that should be re-read.
翻译:本文探讨了大型语言模型参与教育指导性阅读的能力。我们具体评估了其从输入文本中生成有意义问题的能力、生成涵盖内容广度和问题难度的多样化问题的能力,以及根据学生对问题的回答推荐应重新阅读的文本段落的能力。基于对ChatGPT和Bard的评估,我们报告发现:1)大型语言模型能够生成与输入文本高度相关的高质量有意义问题;2)它们能生成涵盖输入文本中大部分主题的多样化问题,尽管随着输入文本长度增加,此能力显著下降;3)大型语言模型能够生成低阶和高阶认知问题,但明显偏向低阶认知问题;4)它们能有效总结学生回答并提取应重读的文本段落。