Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs), regarded as an efficient alternative to transformers. Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e. having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, we believe that delineating their susceptibility to clean-label poisoning, and developing methods for overcoming this susceptibility, are critical research directions to pursue.
翻译:神经网络受一种隐式偏差驱动:梯度下降倾向于以能够泛化至未见数据的方式拟合训练数据。结构化状态空间模型(SSMs)作为Transformer的高效替代方案,正日益受到关注。先前研究认为,在数据由低维教师模型生成的情境下,SSMs的隐式偏差会导向泛化能力。本文重新审视该情境,并正式揭示了一个先前关于SSMs隐式偏差的研究完全未察觉的现象:我们证明,虽然隐式偏差在多种训练数据选择下能实现泛化,但存在某些特殊样本,一旦将其纳入训练,就会完全扭曲隐式偏差,导致泛化失败。这种失败的发生,恰恰在于这些特殊训练样本是由教师模型标注的,即具有干净标签!我们通过独立训练的SSMs以及作为非线性神经网络组成部分的SSMs,对该现象进行了实证验证。在对抗性机器学习领域,使用干净标签的训练样本来破坏泛化能力的行为被称为干净标签毒化攻击。鉴于SSMs的广泛应用,厘清其对干净标签毒化的易感性,并开发克服这种易感性的方法,已成为亟待推进的关键研究方向。