As Large Language Model (LLM) APIs become ubiquitous, users increasingly rely on black-box fingerprinting to verify that providers are serving the advertised premium models. However, these methods may overlook adversarial providers who manipulate model weights to cheat the fingerprint process. We introduce a novel threat termed fingerprint spoofing, where a malicious provider stealthily serves a weaker model that has been parameter-efficiently fine-tuned to mimic a stronger model, thereby evading user-side fingerprinting. We first formally prove that user-side resource constraints (i.e., finite query budgets and weak fingerprinting classifiers) make current fingerprinting vulnerable to fingerprint spoofing. Guided by this theoretical analysis, we propose GhostPrint, a cost-effective attack framework leveraging surrogate modeling, reward-ranked fine-tuning, and knowledge distillation. Extensive evaluations in both static and continual fingerprinting settings demonstrate that GhostPrint allows weak models to consistently bypass representative fingerprint methods while maintaining utility at a low fine-tuning cost, exposing a critical vulnerability in current LLM fingerprinting pipelines.
翻译:随着大型语言模型(LLM)API的普及,用户越来越依赖黑盒指纹识别来验证提供商是否提供了广告中宣传的高级模型。然而,这些方法可能忽略具有对抗性的提供商,他们会操纵模型权重以欺骗指纹识别过程。我们引入了一种名为“指纹欺骗”的新型威胁,其中恶意提供商偷偷提供一种较弱的模型,该模型经过参数高效微调以模仿更强的模型,从而规避用户端的指纹识别。我们首先正式证明,用户端的资源限制(即有限的查询预算和较弱的指纹识别分类器)使得当前的指纹识别容易受到指纹欺骗的攻击。在此理论分析的指导下,我们提出了GhostPrint,一种利用替代建模、奖励排序微调和知识蒸馏的经济有效的攻击框架。在静态和持续指纹识别设置下的广泛评估表明,GhostPrint允许弱模型以较低的微调成本持续绕过典型的指纹识别方法,同时保持实用性,从而揭露了当前LLM指纹识别流程中的一个关键漏洞。