Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed \emph{T-SciQ} that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples by policy for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18\%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5\%.
翻译:大语言模型(LLMs)近期在各类自然语言处理(NLP)任务中展现出卓越性能,并表现出通过思维链(CoT)推理解决复杂问题的能力。最新研究通过使用高质量人工标注的CoT推理链微调多模态模型,探索了复杂多模态场景(如科学问答任务)中的CoT推理。然而,收集高质量的CoT推理链通常耗时且成本高昂。此外,由于缺乏外部关键信息,标注的推理链难以保证准确性。为解决这些问题,我们提出一种名为T-SciQ的新方法,旨在利用LLM信号教授科学问答任务。该方法生成高质量CoT推理链作为教学信号,并创新性地用于训练更小规模模型,使其能在复杂模态中执行CoT推理。同时,我们引入一种新型数据混合策略,通过策略性采样为简单和复杂科学问题生成更有效的教学数据样本。大量实验表明,T-SciQ方法在ScienceQA基准测试中取得了96.18%准确率的最新性能,较最强大的微调基线模型提升4.5%。