The surprisingly likely criterion in the seminal work of Prelec (the Bayesian Truth Serum) guarantees truthfulness in a game-theoretic multi-agent setting, by rewarding rational agents to maximise the expected information gain with their answers w.r.t. their probabilistic beliefs. We investigate the relevance of a similar criterion for responses of LLMs. We hypothesize that if the surprisingly likely criterion works in LLMs, under certain conditions, the responses that maximize the reward under this criterion should be more accurate than the responses that only maximize the posterior probability. Using benchmarks including the TruthfulQA benchmark and using openly available LLMs: GPT-2 and LLaMA-2, we show that the method indeed improves the accuracy significantly (for example, upto 24 percentage points aggregate improvement on TruthfulQA and upto 70 percentage points improvement on individual categories of questions).
翻译:Prelec开创性工作(贝叶斯真话血清)中的“惊人地可能”准则,通过奖励理性主体依据其概率信念最大化预期信息增益,在博弈论多主体环境中保证了真实性。我们研究了类似准则在大型语言模型(LLM)响应中的相关性。假设若“惊人地可能”准则适用于LLM,则在特定条件下,遵循该准则最大化奖励的响应应比仅最大化后验概率的响应更准确。基于TruthfulQA等基准测试,并使用开源LLM:GPT-2和LLaMA-2,我们证明该方法显著提升了准确性(例如,在TruthfulQA上总体准确率提升高达24个百分点,在单项问题类别中提升高达70个百分点)。