Recent advancements in large language models (LLMs) have facilitated the development of chatbots with sophisticated conversational capabilities. However, LLMs exhibit frequent inaccurate responses to queries, hindering applications in educational settings. In this paper, we investigate the effectiveness of integrating a knowledge base (KB) with LLM intelligent tutors to increase response reliability. To achieve this, we design a scaleable KB that affords educational supervisors seamless integration of lesson curricula, which is automatically processed by the intelligent tutoring system. We then detail an evaluation, where student participants were presented with questions about the artificial intelligence curriculum to respond to. GPT-4 intelligent tutors with varying hierarchies of KB access and human domain experts then assessed these responses. Lastly, students cross-examined the intelligent tutors' responses to the domain experts' and ranked their various pedagogical abilities. Results suggest that, although these intelligent tutors still demonstrate a lower accuracy compared to domain experts, the accuracy of the intelligent tutors increases when access to a KB is granted. We also observe that the intelligent tutors with KB access exhibit better pedagogical abilities to speak like a teacher and understand students than those of domain experts, while their ability to help students remains lagging behind domain experts.
翻译:大型语言模型(LLM)的最新进展促进了具有复杂对话能力的聊天机器人的开发。然而,LLM在回答查询时频繁出现不准确的情况,这阻碍了其在教育环境中的应用。本文研究了将知识库(KB)与LLM智能导师相结合以提高回答可靠性的有效性。为此,我们设计了一个可扩展的知识库,使教育管理者能够无缝整合课程教案,并由智能辅导系统自动处理。随后,我们详细描述了一项评估:学生参与者被要求回答关于人工智能课程的问题,由具有不同知识库访问层级的GPT-4智能导师和人类领域专家对这些回答进行评估。最后,学生对智能导师与领域专家的回答进行交叉检验,并对它们的不同教学能力进行排序。结果表明,尽管这些智能导师的准确性仍低于领域专家,但在获得知识库访问权限后,其准确性有所提高。我们还观察到,具备知识库访问权限的智能导师在“像教师一样表达”和“理解学生”方面的教学能力优于领域专家,但其“帮助学生”的能力仍落后于领域专家。