Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.
翻译:大型语言模型能够合理理解并生成人类表达,但可能缺乏深入的思考与推理机制。近期已有若干研究致力于增强语言模型的思维能力,但其中多数并非基于数据驱动或训练优化。本文受自然界认知机制启发,设计了一种名为TaS的新型模型架构,使其能够先对查询进行思考,再基于思考结果生成响应。我们设计了多种流程,从提示-响应样本中标注或生成思考内容,随后在作为思考层的中间层添加语言头。通过使用思考增强数据训练语言模型,我们成功使思考层能自动生成合理的思考内容,并最终输出更合理的响应。定性示例与定量结果均验证了TaS的有效性与性能。代码发布于 https://anonymous.4open.science/r/TadE。