In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.
翻译:通过与二十个问题游戏的类比,我们提出了一种方法,用于在仅能黑盒访问的情况下判断两个大型语言模型是否相同。该方法的目标是使用一小套(良性的)二元问题,通常少于20个。我们形式化了该问题,并首先建立了一个基线方法:从已知基准数据集中随机选取问题,可在20个问题内达到接近100%的准确率。在展示了该问题的最优理论界限后,我们引入了两种有效的提问启发式方法,它们能够以一半的问题数量完成对22个大型语言模型的区分任务。这些方法在隐蔽性方面具有显著优势,因此对于面临模型泄露嫌疑的审计方或版权所有者具有重要价值。