The application of Shapley values to high-dimensional, time-series-like data is computationally challenging - and sometimes impossible. For $N$ inputs the problem is $2^N$ hard. In image processing, clusters of pixels, referred to as superpixels, are used to streamline computations. This research presents an efficient solution for time-seres-like data that adapts the idea of superpixels for Shapley value computation. Motivated by a forensic DNA classification example, the method is applied to multivariate time-series-like data whose features have been classified by a convolutional neural network (CNN). In DNA processing, it is important to identify alleles from the background noise created by DNA extraction and processing. A single DNA profile has $31,200$ scan points to classify, and the classification decisions must be defensible in a court of law. This means that classification is routinely performed by human readers - a monumental and time consuming process. The application of a CNN with fast computation of meaningful Shapley values provides a potential alternative to the classification. This research demonstrates the realistic, accurate and fast computation of Shapley values for this massive task
翻译:将Shapley值应用于高维类时间序列数据在计算上具有挑战性,有时甚至无法实现。对于$N$个输入,该问题的计算复杂度为$2^N$。在图像处理领域,通常采用被称为超像素的像素簇来简化计算。本研究提出了一种针对类时间序列数据的高效解决方案,该方案将超像素思想适配于Shapley值计算。受法医DNA分类实例的启发,本方法被应用于多元类时间序列数据,其特征已通过卷积神经网络(CNN)完成分类。在DNA处理过程中,从DNA提取和处理产生的背景噪声中识别等位基因至关重要。单个DNA谱图包含$31,200$个待分类扫描点,且分类决策必须在法庭上具备可辩护性。这意味着分类工作通常由人工判读完成——这是一个耗时费力的艰巨过程。通过CNN结合快速计算具有实际意义的Shapley值,为此类分类任务提供了潜在的替代方案。本研究展示了针对该海量任务实现现实可行、精确且快速的Shapley值计算。