Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model's Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.
翻译:远程检测大语言模型的变更是一个难题。现有方法要么因成本过高而难以大规模部署,要么需要初始白盒访问模型权重或灰盒访问对数概率。我们的目标是同时实现低成本与严格的黑盒操作,仅观测输出令牌。该方法的关键在于我们称为边界输入的特定输入——对于这类输入,存在多个输出顶级令牌。从统计视角看,最优变更检测取决于模型的雅可比矩阵和输出分布的费希尔信息。在低温机制下对这些量进行分析表明,边界输入能够支持强大的变更检测测试。基于这一洞见,我们提出了黑盒边界输入追踪方案。大量体内与体外实验表明,对于非推理型测试端点,边界输入易于发现,且其性能与现有最佳灰盒方法相当。在严格黑盒环境下运行时,该方案将检测成本较现有方法降低了30倍。