We introduce a formal statistical definition for the problem of backdoor detection in machine learning systems and use it to analyze the feasibility of such problems, providing evidence for the utility and applicability of our definition. The main contributions of this work are an impossibility result and an achievability result for backdoor detection. We show a no-free-lunch theorem, proving that universal (adversary-unaware) backdoor detection is impossible, except for very small alphabet sizes. Thus, we argue, that backdoor detection methods need to be either explicitly, or implicitly adversary-aware. However, our work does not imply that backdoor detection cannot work in specific scenarios, as evidenced by successful backdoor detection methods in the scientific literature. Furthermore, we connect our definition to the probably approximately correct (PAC) learnability of the out-of-distribution detection problem.
翻译:我们为机器学习系统中的后门检测问题引入了一个正式的统计定义,并利用该定义分析了此类问题的可行性,为定义的有效性和适用性提供了证据。本研究的主要贡献在于后门检测的不可行性结果与可实现性结果。我们展示了一个无免费午餐定理,证明通用的(对攻击者无感知的)后门检测是不可能的,除非在字母表规模非常小的情况下。因此,我们认为后门检测方法需要显式或隐式地考虑攻击者信息。然而,我们的工作并不意味着后门检测在特定场景中无法发挥作用,这已由科学文献中成功的后门检测方法所证实。此外,我们将我们的定义与分布外检测问题的可能近似正确(PAC)可学习性联系起来。