Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.
翻译:思维链(CoT)已被证明对需要复杂推理的问题有效。许多此类问题既包含文本模态也包含多模态信息。给定不同模态的输入,模型生成推理过程并据此回答问题。由于幻觉问题,生成的软负推理过程虽具有较高的文本质量但语义不合逻辑,并不总能帮助提升答案准确性。本研究提出一种使用软负采样(SNSE-CoT)的推理过程生成方法,以缓解多模态CoT中的幻觉问题。我们采用五种方法生成与原推理过程高度相似但语义不同的软负样本,并引入双向边际损失(BML),将其纳入仅包含正负样本的传统对比学习框架中。在ScienceQA数据集上的大量实验证明了所提方法的有效性。代码与数据已发布于https://github.com/zgMin/SNSE-CoT。