An evaluator, such as an LLM-as-a-judge, is trustworthy when there exists some agreed-upon way to measure its performance as a labeller. Traditional approaches either rely on testing the evaluator against references or assume that it `knows' somehow the correct labelling. Both approaches fail when references are unavailable: the former requires data, and the latter is an assumption, not evidence. To address this, we introduce the `No-Data Algorithm', which provably establishes trust in an evaluator without requiring any labelled data. Our algorithm works by successively posing challenges to said evaluator. We prove that after $r$ challenge rounds, it accepts an evaluator which knows the correct labels with probability $ \geq 1 - (1/4)^r$, and reliably flags untrustworthy ones. We present formal proofs of correctness, empirical tests, and applications to assessing trust in LLMs-as-judges for low-resource language labelling. Our work enables scientifically-grounded evaluator trust in low-data domains, addressing a critical bottleneck for scalable, trustworthy LLM deployment.
翻译:当存在某种公认的方法来衡量评估者(例如LLM-as-a-judge)作为标注器的性能时,该评估者才是可信的。传统方法要么依赖通过参考标准测试评估者,要么假设其以某种方式“知道”正确的标注。这两种方法在参考标准不可用时均会失效:前者需要数据支持,而后者仅是一种假设而非证据。为解决这一问题,我们提出了“无数据算法”,该算法可在无需任何标注数据的情况下,可证明地建立对评估者的信任。我们的算法通过向所述评估者连续提出挑战来实现。我们证明,经过$r$轮挑战后,该算法以$\geq 1 - (1/4)^r$的概率接受知晓正确标签的评估者,并能可靠地标记不可信的评估者。我们提供了算法的正确性形式化证明、实证测试,以及在低资源语言标注场景中评估LLM-as-judge信任度的应用案例。本工作为低数据领域提供了科学依据的评估者信任建立方法,解决了可扩展、可信赖LLM部署的关键瓶颈问题。