Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the redundant information involved or the essential information missed. To address these issues, we propose a novel method termed \emph{T-SciQ} that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%.
翻译:近期,大语言模型在各类自然语言处理任务中展现出卓越性能,并具备通过思维链推理解决复杂问题的能力。已有研究通过使用高质量人工标注的思维链推理过程微调多模态模型,探索了科学问答等复杂多模态场景中的思维链推理。然而,收集高质量思维链推理过程通常耗时且成本高昂;此外,由于标注过程常包含冗余信息或遗漏关键内容,标注的推理过程难以保证准确性。为解决上述问题,我们提出一种名为T-SciQ的新方法,旨在利用大语言模型信号教授科学问答任务。T-SciQ方法可生成高质量思维链推理过程作为教学信号,进而训练规模更小的模型在复杂模态下执行思维链推理。同时,我们引入新型数据混合策略,为简单与复杂的科学问答问题生成更有效的教学数据样本。大量实验结果表明,T-SciQ方法在ScienceQA基准测试中取得了96.18%的准确率,刷新了当前最优性能。此外,我们的方法比最强微调基线模型高出4.5%。