Textbook question answering (TQA) is a challenging task in artificial intelligence due to the complex nature of context and multimodal data. Although previous research has significantly improved the task, there are still some limitations including the models' weak reasoning and inability to capture contextual information in the lengthy context. The introduction of large language models (LLMs) has revolutionized the field of AI, however, directly applying LLMs often leads to inaccurate answers. This paper proposes a methodology that handle the out-of-domain scenario in TQA where concepts are spread across different lessons by incorporating the retrieval augmented generation (RAG) technique and utilize transfer learning to handle the long context and enhance reasoning abilities. Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
翻译:教材问答(TQA)是人工智能领域的一项挑战性任务,其难点在于上下文的复杂性和多模态数据的特性。尽管已有研究显著提升了该任务的性能,但仍存在模型推理能力薄弱以及无法有效捕捉长文本中上下文信息等局限。大语言模型(LLMs)的引入为人工智能领域带来了革命性变革,但直接应用LLMs往往导致答案不准确。本文提出了一种方法,通过融合检索增强生成(RAG)技术处理TQA中跨课程分布的跨域概念场景,并利用迁移学习处理长文本上下文、增强推理能力。通过对Llama-2大语言模型进行监督微调并引入RAG技术,我们的架构在非图表选择题的验证集和测试集上分别实现了4.12%和9.84%的准确率提升,显著超越基线模型。