Image data have been extensively used in Deep Neural Network (DNN) tasks in various scenarios, e.g., autonomous driving and medical image analysis, which incurs significant privacy concerns. Existing privacy protection techniques are unable to efficiently protect such data. For example, Differential Privacy (DP) that is an emerging technique protects data with strong privacy guarantee cannot effectively protect visual features of exposed image dataset. In this paper, we propose a novel privacy-preserving framework VisualMixer that protects the training data of visual DNN tasks by pixel shuffling, while not injecting any noises. VisualMixer utilizes a new privacy metric called Visual Feature Entropy (VFE) to effectively quantify the visual features of an image from both biological and machine vision aspects. In VisualMixer, we devise a task-agnostic image obfuscation method to protect the visual privacy of data for DNN training and inference. For each image, it determines regions for pixel shuffling in the image and the sizes of these regions according to the desired VFE. It shuffles pixels both in the spatial domain and in the chromatic channel space in the regions without injecting noises so that it can prevent visual features from being discerned and recognized, while incurring negligible accuracy loss. Extensive experiments on real-world datasets demonstrate that VisualMixer can effectively preserve the visual privacy with negligible accuracy loss, i.e., at average 2.35 percentage points of model accuracy loss, and almost no performance degradation on model training.
翻译:图像数据已被广泛应用于各类深度神经网络(DNN)任务中,例如自动驾驶和医学图像分析,这引发了显著的隐私担忧。现有的隐私保护技术无法有效保护此类数据。例如,差分隐私(DP)作为一种新兴技术虽能提供强隐私保障,却无法有效保护公开图像数据集的视觉特征。本文提出一种新型隐私保护框架VisualMixer,该框架通过像素混洗(不注入任何噪声)来保护视觉DNN任务训练数据。VisualMixer采用名为视觉特征熵(VFE)的新型隐私度量,从生物学和机器视觉两个角度有效量化图像的视觉特征。在该框架中,我们设计了一种任务无关的图像混淆方法,用于保护DNN训练和推理过程中数据的视觉隐私。对于每张图像,该方法根据目标VFE值确定像素混洗区域及其尺寸,并在不注入噪声的前提下,对空间域和色彩通道空间内的像素进行混洗,从而防止视觉特征被识别和辨认,同时将精度损失降至可忽略水平。在真实数据集上的大量实验表明,VisualMixer能够在保证模型精度的前提下有效保护视觉隐私——模型精度平均仅下降2.35个百分点,且对模型训练性能几乎无影响。