Discovering patterns in data that best describe the differences between classes allows to hypothesize and reason about class-specific mechanisms. In molecular biology, for example, this bears promise of advancing the understanding of cellular processes differing between tissues or diseases, which could lead to novel treatments. To be useful in practice, methods that tackle the problem of finding such differential patterns have to be readily interpretable by domain experts, and scalable to the extremely high-dimensional data. In this work, we propose a novel, inherently interpretable binary neural network architecture DIFFNAPS that extracts differential patterns from data. DiffNaps is scalable to hundreds of thousands of features and robust to noise, thus overcoming the limitations of current state-of-the-art methods in large-scale applications such as in biology. We show on synthetic and real world data, including three biological applications, that, unlike its competitors, DiffNaps consistently yields accurate, succinct, and interpretable class descriptions
翻译:从数据中发现最能描述类别间差异的模式,有助于对类别特异性机制提出假设并进行推理。例如在分子生物学中,这有望推动对不同组织或疾病间细胞过程差异的理解,进而催生新型治疗方法。要使其在实践中发挥作用,解决此类差异模式发现问题的方法必须易于领域专家解释,并且能够扩展到极高维数据。本文提出一种新型、固有可解释的二元神经网络架构DIFFNAPS,可从数据中提取差异模式。DiffNaps可扩展至数十万维特征且对噪声鲁棒,从而克服了当前最先进方法在生物学等大规模应用场景中的局限性。我们在合成数据和真实世界数据(包括三项生物学应用)上的实验表明,与同类方法不同,DiffNaps始终能生成准确、简洁且可解释的类别描述。