In this research, we focus on the usage of adversarial sampling to test for the fairness in the prediction of deep neural network model across different classes of image in a given dataset. While several framework had been proposed to ensure robustness of machine learning model against adversarial attack, some of which includes adversarial training algorithm. There is still the pitfall that adversarial training algorithm tends to cause disparity in accuracy and robustness among different group. Our research is aimed at using adversarial sampling to test for fairness in the prediction of deep neural network model across different classes or categories of image in a given dataset. We successfully demonstrated a new method of ensuring fairness across various group of input in deep neural network classifier. We trained our neural network model on the original image, and without training our model on the perturbed or attacked image. When we feed the adversarial samplings to our model, it was able to predict the original category/ class of the image the adversarial sample belongs to. We also introduced and used the separation of concern concept from software engineering whereby there is an additional standalone filter layer that filters perturbed image by heavily removing the noise or attack before automatically passing it to the network for classification, we were able to have accuracy of 93.3%. Cifar-10 dataset have ten categories of dataset, and so, in order to account for fairness, we applied our hypothesis across each categories of dataset and were able to get a consistent result and accuracy.
翻译:本研究聚焦于利用对抗采样来测试深度神经网络模型在给定数据集中不同类别图像预测的公平性。尽管已有多种框架被提出用于确保机器学习模型对对抗攻击的鲁棒性(其中包括对抗训练算法),但对抗训练算法仍存在一个缺陷:易导致不同群体间准确率与鲁棒性的差异。本研究旨在通过对抗采样,测试深度神经网络模型在给定数据集中不同类别或类型图像预测的公平性。我们成功展示了一种确保深度神经网络分类器对不同输入群体公平性的新方法。我们在原始图像上训练神经网络模型,而未在扰动或攻击图像上训练模型。当将对抗样本输入模型时,模型能够预测对抗样本所属图像的原始类别。此外,我们引入并应用了软件工程中的关注点分离概念,即增加一个独立的预处理滤波器层,该层通过大幅去除噪声或攻击来过滤扰动图像,随后自动将其传递给网络进行分类,最终实现了93.3%的准确率。Cifar-10数据集包含十个类别,为衡量公平性,我们将假设应用于每个类别,并获得了一致的结果与准确率。