Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the redundant information involved or the essential information missed. To address these issues, we propose a novel method termed \emph{T-SciQ} that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%.
翻译:大语言模型(LLMs)近期在各类自然语言处理(NLP)任务中展现出卓越性能,并具备通过思维链(CoT)推理解决复杂问题的能力。最新研究通过使用高质量人工标注的CoT推理链微调多模态模型,探索了复杂多模态场景(如科学问答任务)中的CoT推理。然而,收集高质量CoT推理链通常耗时且成本高昂。此外,由于涉及冗余信息或遗漏关键信息,人工标注的推理链难以保证准确性。为解决这些问题,我们提出一种名为\emph{T-SciQ}的新方法,旨在利用大语言模型信号教授科学问答。T-SciQ方法生成高质量CoT推理链作为教学信号,并进一步训练更小的模型在复杂模态中进行CoT推理。同时,我们引入一种新颖的数据混合策略,为简单和复杂的科学问答问题生成更有效的教学数据样本。大量实验结果表明,我们的T-SciQ方法在ScienceQA基准测试上达到了96.18%的准确率,创下新的最优性能。此外,该方法比最强大的微调基线模型性能提升4.5%。