Concept Drift Detection using Ensemble of Integrally Private Models

Deep neural networks (DNNs) are one of the most widely used machine learning algorithm. DNNs requires the training data to be available beforehand with true labels. This is not feasible for many real-world problems where data arrives in the streaming form and acquisition of true labels are scarce and expensive. In the literature, not much focus has been given to the privacy prospect of the streaming data, where data may change its distribution frequently. These concept drifts must be detected privately in order to avoid any disclosure risk from DNNs. Existing privacy models use concept drift detection schemes such ADWIN, KSWIN to detect the drifts. In this paper, we focus on the notion of integrally private DNNs to detect concept drifts. Integrally private DNNs are the models which recur frequently from different datasets. Based on this, we introduce an ensemble methodology which we call 'Integrally Private Drift Detection' (IPDD) method to detect concept drift from private models. Our IPDD method does not require labels to detect drift but assumes true labels are available once the drift has been detected. We have experimented with binary and multi-class synthetic and real-world data. Our experimental results show that our methodology can privately detect concept drift, has comparable utility (even better in some cases) with ADWIN and outperforms utility from different levels of differentially private models. The source code for the paper is available \hyperlink{https://github.com/Ayush-Umu/Concept-drift-detection-Using-Integrally-private-models}{here}.

翻译：深度神经网络（DNN）是目前应用最广泛的机器学习算法之一。DNN通常需要预先获得带有真实标签的训练数据。然而，对于许多现实世界的问题，数据以流式形式到达，且真实标签的获取既稀缺又昂贵，这种方法并不可行。现有文献对流式数据的隐私保护问题关注不足，而此类数据的数据分布可能频繁变化。为避免DNN带来的泄露风险，必须对这些概念漂移进行隐私保护的检测。现有的隐私模型采用如ADWIN、KSWIN等概念漂移检测方案来识别漂移。本文聚焦于利用整体隐私DNN的概念来检测概念漂移。整体隐私DNN是指能从不同数据集中频繁重现的模型。基于此，我们提出了一种集成方法，称为“整体隐私漂移检测”（IPDD）方法，用于从隐私模型中检测概念漂移。我们的IPDD方法无需标签即可检测漂移，但假设一旦检测到漂移后可以获得真实标签。我们在二分类与多分类的合成数据及真实数据上进行了实验。实验结果表明，我们的方法能够以隐私保护的方式检测概念漂移，其效用与ADWIN相当（在某些情况下甚至更优），并且优于不同级别的差分隐私模型。本文源代码可\hyperlink{https://github.com/Ayush-Umu/Concept-drift-detection-Using-Integrally-private-models}{在此处}获取。