Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG), supervised fine-tuning (SFT), and learning from human preferences data using Direct Preference Optimization (DPO). Through extensive experimentation on a Piazza dataset from an introductory CS course, comprising 10,000 QA pairs and 1,500 pairs of preference data, we demonstrate a significant 30% improvement in the quality of answers, with RAG being a particularly impactful addition. Our contributions include the development of a novel architecture for educational QA, extensive evaluations of LLM performance utilizing both human assessments and LLM-based metrics, and insights into the challenges and future directions of educational data processing. This work paves the way for the development of CHATA, an intelligent QA assistant customizable for courses with an online QA platform
翻译:每学期在在线问答平台上回应数千个学生问题需要耗费大量人力成本,尤其在注册人数快速增长的计算类课程中。为应对可扩展的智能问答挑战,我们提出一种创新解决方案——采用LLaMA-2系列开源大语言模型以确保数据隐私。该方法融合了检索增强生成(RAG)、监督微调(SFT)及基于直接偏好优化(DPO)的人类偏好数据学习等增强技术。通过在一门计算机导论课程的Piazza数据集(包含10,000组问答对与1,500组偏好数据对)上进行广泛实验,我们证明了答案质量显著提升30%,其中RAG尤为关键。我们的贡献包括:开发面向教育问答的新型架构、利用人工评估与基于大语言模型的指标全面评估模型性能,以及探讨教育数据处理中的挑战与未来方向。该工作为开发可定制于在线问答平台课程的智能问答助手CHATA奠定了基础。