Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at https://github.com/SeungoneKim/CoTEVer.
翻译:链式思维提示(CoT)使大型语言模型(LLM)能够在生成最终预测前先产生解释,从而解决复杂推理任务。尽管其展现出显著潜力,但CoT提示的一个关键缺陷在于:生成解释的事实准确性会严重影响模型性能。为提升解释的正确性,需使用解释数据对语言模型进行微调。然而,目前可用于此类方法的数据集极少,且缺乏构建此类数据集的采集工具。为此,我们提出CoTEVer——一个用于标注生成解释的事实正确性并收集错误解释修订数据的工具包。此外,我们提出了若干使用场景,展示如何利用CoTEVer收集的数据增强解释的忠实度。本工具包已开源,访问地址为:https://github.com/SeungoneKim/CoTEVer。