The literature on provable robustness in machine learning has primarily focused on static prediction problems, such as image classification, in which input samples are assumed to be independent and model performance is measured as an expectation over the input distribution. Robustness certificates are derived for individual input instances with the assumption that the model is evaluated on each instance separately. However, in many deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions. Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios. In this work, we focus on the provable robustness of machine learning models in the context of data streams, where inputs are presented as a sequence of potentially correlated items. We derive robustness certificates for models that use a fixed-size sliding window over the input stream. Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams. We perform experiments on speech detection and human activity recognition tasks and show that our certificates can produce meaningful performance guarantees against adversarial perturbations.
翻译:机器学习中关于可证明鲁棒性的文献主要关注静态预测问题,例如图像分类,其中假设输入样本相互独立,模型性能通过输入分布上的期望来衡量。鲁棒性证书针对单个输入实例推导,前提是模型对每个实例分别评估。然而,在许多深度学习应用(如在线内容推荐和股票市场分析)中,模型会利用历史数据进行预测。基于输入样本独立假设的鲁棒性证书在此类场景中并不直接适用。在本工作中,我们关注数据流背景下机器学习模型的可证明鲁棒性,其中输入以可能存在相关性的项目序列形式呈现。我们针对在输入流上使用固定大小滑动窗口的模型推导了鲁棒性证书。我们的保证适用于整个流上的平均模型性能,且与流的大小无关,使其适用于大规模数据流。我们在语音检测和人类活动识别任务上进行了实验,结果表明我们的证书能够针对对抗性扰动提供有意义的性能保证。