The representations of the activation space of deep neural networks (DNNs) are widely utilized for tasks like natural language processing, anomaly detection and speech recognition. Due to the diverse nature of these tasks and the large size of DNNs, an efficient and task-independent representation of activations becomes crucial. Empirical p-values have been used to quantify the relative strength of an observed node activation compared to activations created by already-known inputs. Nonetheless, keeping raw data for these calculations increases memory resource consumption and raises privacy concerns. To this end, we propose a model-agnostic framework for creating representations of activations in DNNs using node-specific histograms to compute p-values of observed activations without retaining already-known inputs. Our proposed approach demonstrates promising potential when validated with multiple network architectures across various downstream tasks and compared with the kernel density estimates and brute-force empirical baselines. In addition, the framework reduces memory usage by 30% with up to 4 times faster p-value computing time while maintaining state of-the-art detection power in downstream tasks such as the detection of adversarial attacks and synthesized content. Moreover, as we do not persist raw data at inference time, we could potentially reduce susceptibility to attacks and privacy issues.
翻译:深度神经网络(DNN)的激活空间表示被广泛应用于自然语言处理、异常检测和语音识别等任务中。由于这些任务的多样性以及DNN的巨大规模,一种高效且与任务无关的激活表示方法变得至关重要。经验p值曾被用于量化观测节点激活相对于已知输入产生的激活的相对强度。然而,保留原始数据进行这些计算会增加内存资源消耗并引发隐私问题。为此,我们提出了一种与模型无关的框架,通过使用节点特定的直方图来计算观测激活的p值,而无需保留已知输入。该框架在多种下游任务中,与核密度估计和暴力经验基线方法进行对比验证时,展现了良好的潜力。此外,该框架将内存使用量减少了30%,p值计算速度提升至多4倍,同时在下游任务(如对抗攻击检测和合成内容检测)中保持了最先进的检测能力。由于我们在推理时不持久化原始数据,该方法还可能降低受到攻击和隐私问题的风险。