LLaMa-SciQ：一个用于回答科学多项选择题的教育聊天机器人 (LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ)

Large Language Models (LLMs) often struggle with tasks requiring mathematical reasoning, particularly multiple-choice questions (MCQs). To address this issue, we developed LLaMa-SciQ, an educational chatbot designed to assist college students in solving and understanding MCQs in STEM fields. We begin by fine-tuning and aligning the models to human preferences. After comparing the performance of Mistral-7B and LLaMa-8B, we selected the latter as the base model due to its higher evaluation accuracy. To further enhance accuracy, we implement Retrieval-Augmented Generation (RAG) and apply quantization to compress the model, reducing inference time and increasing accessibility for students. For mathematical reasoning, LLaMa-SciQ achieved 74.5% accuracy on the GSM8k dataset and 30% on the MATH dataset. However, RAG does not improve performance and even reduces it, likely due to retriever issues or the model's unfamiliarity with context. Despite this, the quantized model shows only a 5% loss in performance, demonstrating significant efficiency improvements.

翻译：大型语言模型（LLM）在处理需要数学推理的任务时常常遇到困难，尤其是在多项选择题（MCQ）方面。为了解决这个问题，我们开发了LLaMa-SciQ，这是一个旨在帮助大学生解决和理解STEM领域多项选择题的教育聊天机器人。我们首先对模型进行微调，并将其与人类偏好对齐。在比较了Mistral-7B和LLaMa-8B的性能后，我们选择了后者作为基础模型，因为它具有更高的评估准确率。为了进一步提高准确率，我们实现了检索增强生成（RAG），并应用量化技术来压缩模型，从而减少推理时间并提高学生的可访问性。在数学推理方面，LLaMa-SciQ在GSM8k数据集上达到了74.5%的准确率，在MATH数据集上达到了30%。然而，RAG并未提升性能，甚至降低了性能，这可能是由于检索器问题或模型对上下文不熟悉所致。尽管如此，量化后的模型仅表现出5%的性能损失，显示出显著的效率提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日