Hallucination continues to be one of the most critical challenges in the institutional adoption journey of Large Language Models (LLMs). In this context, an overwhelming number of studies have focused on analyzing the post-generation phase - refining outputs via feedback, analyzing logit output values, or deriving clues via the outputs' artifacts. We propose HalluciBot, a model that predicts the probability of hallucination $\textbf{before generation}$, for any query imposed to an LLM. In essence, HalluciBot does not invoke any generation during inference. To derive empirical evidence for HalluciBot, we employ a Multi-Agent Monte Carlo Simulation using a Query Perturbator to craft $n$ variations per query at train time. The construction of our Query Perturbator is motivated by our introduction of a new definition of hallucination - $\textit{truthful hallucination}$. Our training methodology generated 2,219,022 estimates for a training corpus of 369,837 queries, spanning 13 diverse datasets and 3 question-answering scenarios. HalluciBot predicts both binary and multi-class probabilities of hallucination, enabling a means to judge the query's quality with regards to its propensity to hallucinate. Therefore, HalluciBot paves the way to revise or cancel a query before generation and the ensuing computational waste. Moreover, it provides a lucid means to measure user accountability for hallucinatory queries.
翻译:幻觉问题仍是大型语言模型(LLMs)在机构应用过程中面临的最关键挑战之一。在此背景下,绝大多数研究聚焦于后生成阶段——通过反馈优化输出、分析logit输出值,或从输出产物中推导线索。我们提出HalluciBot模型,该模型可在$\textbf{生成前}$预测任何输入至LLM的查询引发幻觉的概率。本质上,HalluciBot在推理过程中完全不调用生成操作。为获取HalluciBot的实证依据,我们采用基于查询扰动器的多智能体蒙特卡洛模拟,在训练阶段为每个查询生成$n$种变体。我们提出的新型幻觉定义——$\textit{诚实型幻觉}$——直接驱动了查询扰动器的构建。训练方法为包含369,837条查询的语料库生成了2,219,022个估计值,覆盖13个不同数据集和3种问答场景。HalluciBot可预测幻觉的二分类及多类概率,从而提供评估查询质量(即查询诱发幻觉倾向性)的方法。因此,HalluciBot为在生成前取消或修正查询、避免计算资源浪费开辟了新途径,且能清晰衡量用户对引发幻觉的查询应承担的责任。