Assessing ways in which Language Models can reduce their hallucinations and improve the outputs' quality is crucial to ensure their large-scale use. However, methods such as fine-tuning on domain-specific data or the training of a separate \textit{ad hoc} verifier require demanding computational resources (not feasible for many user applications) and constrain the models to specific fields of knowledge. In this thesis, we propose a dialectic pipeline that preserves LLMs' generalization abilities while improving the quality of its answer via self-dialogue, enabling it to reflect upon and correct tentative wrong answers. We experimented with different pipeline settings, testing our proposed method on different datasets and on different families of models. All the pipeline stages are enriched with the relevant context (in an oracle-RAG setting) and a study on the impact of its summarization or its filtering is conducted. We find that our proposed dialectic pipeline is able to outperform by significative margins the standard model answers and that it consistently achieves higher performances than Chain-of-Thought only prompting.
翻译:评估语言模型减少幻觉并提升输出质量的方法,对于确保其大规模应用至关重要。然而,诸如在特定领域数据上进行微调或训练单独的临时验证器等方法,需要大量计算资源(对许多用户应用而言不可行),并将模型限制在特定知识领域。本文提出一种辩证式处理流程,该流程通过自我对话,使模型能够反思并修正初步的错误答案,从而在保持大语言模型泛化能力的同时提升其回答质量。我们实验了不同的流程设置,在多个数据集和不同模型系列上测试了所提出的方法。所有处理阶段均在相关上下文(在oracle-RAG设置中)中得到丰富,并对上下文摘要或过滤的影响进行了研究。我们发现,所提出的辩证式处理流程能够以显著优势超越标准模型的回答,并且始终比仅使用思维链提示获得更高的性能。