Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Based on this, we introduce \textsc{BProm}, a black-box model-level detection method to identify backdoors in suspicious models, if any. \textsc{BProm} leverages the low classification accuracy of prompted models when backdoors are present. Extensive experiments confirm \textsc{BProm}'s effectiveness.
翻译:视觉提示是一种新技术,可将源域任务上训练良好的冻结模型适配至目标域任务。本研究探讨了视觉提示在黑盒模型级后门检测中的优势。视觉提示中的视觉提示器在源域与目标域之间映射类别子空间。我们发现了干净数据集与中毒数据集之间存在一种错位,称为类别子空间不一致性。基于此,我们提出了 \textsc{BProm},一种黑盒模型级检测方法,用于识别可疑模型中可能存在的后门。\textsc{BProm} 利用了存在后门时提示模型分类准确率较低的特性。大量实验证实了 \textsc{BProm} 的有效性。